Monday 24 September 2012

SpiderOak for system backup

We decided to use SpiderOak for our off-site system backups because of price and privacy. For 200GB storage it's $200 a year, which compared well with Dropbox at the time (although now they are the same price). However, SpiderOak charge for the storage that they use after compression and deduplication, which means I'm currently storing 243Gb, and I've still got around 40GB of my 200GB unused.
They also have their famous "zero-knowledge" thing – provided you never tell them your password, they can't decrypt your files. But in an emergency you can log onto their web interface and still get at the files – so from a disaster recovery point of view, this beats encrypting the files separately.

Installing SpiderOak

SpiderOak helpfully provide an Ubuntu repository. However, when SpiderOak is installed, it adds its own entry to /etc/apt/sources.list.d leading to complaints from apt:
W: Duplicate sources.list entry
I elected to just install the .deb where I needed it, and rely on it to set up the repository for itself. Somehow it seems the wrong way round.

SpiderOak at the command line

SpiderOak's command line is generally excellent and well-documented. It isn't really designed for system-wide use, though. It stores all of its configuration in ~/.SpiderOak so if you want to run it a root (as I do), you need to make sure that $HOME is set right. This means running all of the commands as
sudo -H SpiderOak ... 
The biggest gotcha with this is that if you forget the -H during setup, it will work fine, putting the configuration in /home/your-id/.SpiderOak. It's only when you come to running SpiderOak from an init script, or from another user account, that you have trouble

Init script for SpiderOak

Anthony Mercatante has a PPA of SpiderOak support packages which adds an init script and a default file for SpiderOak. I adapted these to make logging a bit more configurable (based on a couple of forum posts, here and here), which you can get from github.
These scripts allow you to do a
sudo service spideroak start
but also provide a rather handy
sudo service spideroak log-spider 
for tailing the current log file – since it changes its name according to date and lives in a non-readable directory, breaking tab completion.

Allowing SpiderOak to monitor files

SpiderOak cleverly uses Linux's inotify to monitor for changed files. It sets up a watch on every directory in your backup set, so that the kernel can notify it when a file is added, changed or deleted. This works fine provided you have your limits set high enough so that every directory can be monitored. If not, the SpiderOak inotify_dir_watcher process exits, and SpiderOak falls back on searching the tree periodically.
SpiderOak includes a file on installation which sets things up: /etc/sysctl.d/30-spideroak.conf:
fs.inotify.max_user_watches = 65536
Though this wasn't enough for me. A quick check on the number of folders via
sudo find /srv/image -type d | wc
revealed that a more suitable figure would be
fs.inotify.max_user_watches = 262144