If you host Git repositories, you might want to implement a cron job that automatically triggers garbage collection on the server side. As a regular user you can't usually access the unreachable objects anyway, so there's no point to keep them.
However, when invoking git gc
, Git will pack loose objects together.
This has a huge advantage: When a user clones a whole repository, Git
will compress all objects within a single packfile and transfer it via
the Git protocol. If all the objects are already in one packfile,
there's no overhead in creating a temporary packfile. (If you just
want to get a subset of commits, it's easier for git to "thin out" the
existing packfile, too.)
You can usually see if a computationally expensive temporary packfile
is created if there is a message like remote: counting objects ...
that keeps on counting for a while. For some hosters, this takes quite
some time, because the server is under high load.
I use the following script to trigger git gc
every night:
#!/bin/sh
BASE=/var/git/repositories
su - git -c "
cd $BASE
find . -name '*.git' -type d | while read repo; do
cd $BASE/\$repo && git gc >/dev/null 2>&1
done
"
You might want to omit the su
part if you create a script that's
executable by the owner of your git repositories itself.
Update: If you don't use cgit's age
files, you'll have all your
repos displaying they were recently changed in the "idle" column. To work
around this, include the following command adter the git gc
call:
mkdir -p info/web &&
git for-each-ref \
--sort=-committerdate \
--format='%(committerdate:iso8601)' \
--count=1 'refs/heads/*' \
> info/web/last-modified