Since I started hosting some of my git repository on my own server instead of Github, I wanted a backup strategy for my repositories. I see several possibilities there:

  • do not backup data on the server, because all local repositories together should have all information
  • mirror data on the server to a secondary git repository
  • copy the folders somewhere else, where they cannot be accessed as a repository; e.g. in a compressed archive

The first option in my opinion is not the best one, because you might have to restore the state from a lot of different PCs. The simplest scenario is that you work on branch A from host1, on branch B on host2 and so on.

The second option is a good one, because it also allows you to continue pushing and pulling changes in case your main remote fails. However, I did not have a second server and CodeCommit is more expensive than my selected option.

I opted for the third option and created a small script that runs as a cron job, which sends all git repositories as a compressed archive to S3 storage. Even if you do mirroring of a repository to another one, you’d still want to implement daily backups, because they allow you to return to an older state in case your repository gets corrupt for some reason.

All the script really does is putting all files into an archive with the current timestamp and transmitting this to S3. I currently do not have a deletion strategy for old backups, but I assume it’s OK to just delete them after one month (and not follow a sophisticated deletion strategy with increasing time spans the older the backups get).

#!/usr/bin/env bash

now=$(date +"%Y%m%d-%H%M%S")

tar zcf $filepath -C /home/git git
docker run -v $filepath:/home/aws/backup.tar.gz --rm awscli s3 cp /home/aws/backup.tar.gz s3://${bucket}/git/$filename
rm $filepath

This first creates the archive with current date in the file name, then calls AWS-cli (inside a docker container) to submit the file to AWS S3 and finally deletes the archive again.