Background
Having suffered data loss in the past and hacking on storage suggests that
it's a good idea to have regular backups. I wanted redundancy in case my
local server failed and I wanted to encrypt my backups using a password
protected gpg key.
The current solution uses a passphrase kept in plain text outside of the
backup path. I plan to investigate moving the gpg key to a smartcard and
using a pin key to unlock it instead. If anyone has any additional solutions
please describe them in detail.
Persisting requisite environmental variables
Running anything from cron detaches it from your current environment, you lose
all of the variables describing things like your ssh-agent gpg-agent, stuff you
need to begin to communicate with the remote server.
I took a simple approach, in my ~.bashrc I created the following.
cat > ~/.backenvrc << EOF
# used by crontab backup script
export SSH_AGENT_PID=$SSH_AGENT_PID
export SSH_AUTH_SOCK=$SSH_AUTH_SOCK
export GPG_AGENT_INFO=$GPG_AGENT_INFO
export GPGKEY=XXX-insert-your-gpg-key-here-XXX
EOF
and simply source this from the backup script referenced in my crontab,
I merely need only login once to populate this file.
Setting up the Crontab
# crontab -l
# m h dom mon dow command
MAILTO=ppetraki@localhost
BACKUP=/home/ppetraki/Documents/System/Backup
#
0 0 * * * /usr/bin/crontab -l > $BACKUP/crontab-backup
0 0 * * * /usr/bin/dpkg --get-selections > $BACKUP/installed-software
0 0 * * * /usr/local/bin/ppetraki-backup.sh inc
0 0 * * Fri /usr/local/bin/ppetraki-backup.sh full
Note that I am also backing up my crontab and my list of installed software,
eventually I will move this into another script that also does things like
- backup my bookmarks from chrome and firefox
- backup mail in a non-binary format
The current cron format performs an incremental backup every night and a
full backup every Friday.
Driver script
This wraps the invocation of duplicity and acquires the necessary environmental
variables. Duplicity itself can be hairy with all the command line switches
and even more of a burden if you have multiple targets. I have redundant backups,
first to a local server and to a remote service provided by rsync.net (great customer
support!). I found
horcrux to be a wonderful, lightweight, duplicity wrapper to suit my needs.
The driver script, which is external to my backup path, also contains my GPG passphrase
to encrypt my backups. Eventually I wish to move to a smartcard driven system
illustrated here
#!/bin/bash
# [/usr/local/bin/ppetraki-backup.sh]
export PATH=$PATH:/usr/local/bin
action=$1
export USER=XXX
export HOME=/home/$USER
source $HOME/.backenvrc
echo "verifying environment"
echo "gpg-agent: ${GPG_AGENT_INFO}"
echo "gpg-key: ${GPGKEY}"
echo "ssh-agent-pid: ${SSH_AGENT_PID}"
echo "ssh-auth-sock: ${SSH_AUTH_SOCK}"
if [ -z $action ]; then
echo "requires an action!"
exit 1
fi
export PASSPHRASE=
[ -z $PASSPHRASE ] && exit 1
echo "begin"
for config in local_backup remote_backup
do
horcrux clean $config
horcrux $action $config
done
Using horcrux to wrangle duplicity
Horcrux has the notion of profiles that takes all the complexity out of managing
the duplicity CLI. Here's an example of a profile.
cat /home/ppetraki/.horcrux/local_backup-config
destination_path="rsync://192.168.1.XXX/backups/personal"
cat ~/.horcrux/local_backup-exclude
- /home/ppetraki/Sandbox
- /home/ppetraki/Bugs
- /home/ppetraki/Downloads
- /home/ppetraki/Videos
- /home/ppetraki/.xsession-errors
- /home/ppetraki/.thumbnails
- /home/ppetraki/.local
- /home/ppetraki/.gvfs
- /home/ppetraki/.systemtap
- /home/ppetraki/.adobe/Flash_Player/AssetCache
- /home/ppetraki/.thunderbird
- /home/ppetraki/.mozilla
- /home/ppetraki/.config/google-googletalkplugin
- /home/ppetraki/.config/google-chrome
- /home/ppetraki/.cache
- /home/ppetraki/**[cC]ache*
I found it problematic to backup only sub directories of things like mozilla
and google-chrome, instead I will write an additional script to cherry pick
those files for backup.
The main horcrux config file
cat ~/.horcrux/horcrux.conf
source="/home/ppetraki/" # Ensure trailing slash
encrypt_key=XXXXXX # Public key ID to encrypt backups with
sign_key='-' # Key ID to sign backups with (leave as '-' for no signing)
use_agent=false # Use gpg-agent?
remove_n=3 # Number of full filesets to remove
verbosity=5 # Logs all the file changes (see duplicity man page)
vol_size=25 # Split the backup into 25MB volumes
full_if_old=30D # Cause 'full' operation to perform a full
# backup if older than 360 days
backup_basename='backup' # Directory name for local backups (i.e., destination
# /Volumes/my_drive/backup/ or /media/my_drive/backup/)
dup_params='--use-agent' # Parameters to pass to Duplicity
This is great as it reduces a backup invocation to this:
$ horcrux inc local_backup
Monitoring
I defined MAILTO in my crontab and also installed mutt and the reconfigured
postfix for local mail delivery. Every night I get a progress report on how
the backups ran.
Conclusion
I've spent quite a bit of time determining how to automate this in and provide
strong encryption. If you have a more secure way to encrypt the backups I would
be happy to hear it.