Julius Plenz – Blog

gc via cron on hosted git repos

If you host Git repositories, you might want to implement a cron job that automatically triggers garbage collection on the server side. As a regular user you can't usually access the unreachable objects anyway, so there's no point to keep them.

However, when invoking git gc, Git will pack loose objects together. This has a huge advantage: When a user clones a whole repository, Git will compress all objects within a single packfile and transfer it via the Git protocol. If all the objects are already in one packfile, there's no overhead in creating a temporary packfile. (If you just want to get a subset of commits, it's easier for git to "thin out" the existing packfile, too.)

You can usually see if a computationally expensive temporary packfile is created if there is a message like remote: counting objects ... that keeps on counting for a while. For some hosters, this takes quite some time, because the server is under high load.

I use the following script to trigger git gc every night:

#!/bin/sh

BASE=/var/git/repositories

su - git -c "
cd $BASE
find . -name '*.git' -type d | while read repo; do
    cd $BASE/\$repo && git gc >/dev/null 2>&1
done
"

You might want to omit the su part if you create a script that's executable by the owner of your git repositories itself.

Update: If you don't use cgit's age files, you'll have all your repos displaying they were recently changed in the "idle" column. To work around this, include the following command adter the git gc call:

mkdir -p info/web &&
git for-each-ref \
    --sort=-committerdate \
    --format='%(committerdate:iso8601)' \
    --count=1 'refs/heads/*' \
    > info/web/last-modified

posted 2011-10-12 tagged git