Thursday Night

Paul Betts’s personal website / blog / what-have-you

Amazon S3 backup just saved my butt

My LVM volume

As I sit here waiting for dd_rescue to try to salvage what’s left of my 500GB hard drive, and wondering what will become of the 1.75 TB LVM volume that is now toast, I count my lucky stars that I decided a month ago to finally fix that new backup system that I was thinking about – had I not done that, 5000+ photos, including from our wedding, would’ve been down the drain. I feel like I just finished backing up everything yesterday, and it’s already saved me from disaster.

Let me take a step back. For years, I’ve used Logical Volume Manager to manage the volume that I store my movies and TV shows on. The advantage of LVM is, that it abstracts away the physical drives – to my apps, it’s just one large hard drive, but I can use the admin tools to add and remove drives without formatting.

That’s all well and good, but the big caveat to running drives like this, is that if any one of them fails, the whole volume is gone. I knew this going into it, so I made sure that any data I really cared about had another copy somewhere.

So originally, I had a setup which ended up rsync’ing to this website, which is hosted on Linode, who I very much recommend by the way – I’ve used it for years and support is fantastic and the site is pretty powerful. That worked fine until I bought my wife a DSLR, and her great but giant photos caused my 4GB of diskspace to vanish. So for a long time, I just gave up on backup, and since the photos folder is so gigantic, the only place I could put it is on the previously mentioned storage volume and symlink it to where it should be. Bad developer, no Twinkie!

Why backup using Amazon S3

Amazon S3 as a user-friendly backup solution sucks – it’s extremely developer-centric, and it’s about as friendly as the tram drivers in Berlin, which afaik are some of the least friendly/helpful people on the planet.

However, here’s why it’s great – it’s cheap, and it has no limits. All of the free file storage / backup services top out at ~4 GB or so, and the paid services start out at $50/yr for > enough space for us. I’ve backed up 20GB on S3 so far, and I’ve incurred about $3.50 in charges. For periodic automated backup, it’s hard to beat S3 as a backend.

If you’re a programmer, just think of Amazon S3 as basically a giant hash table (or Dictionary<string, byte[]> for you .NET people). First, you set up a bucket, which is an instance of the table – you won’t have more than a few of these, and probably need only one for backup. These bucket names are shared among everyone, so you have to make them unique – I just prefix my username to it since I don’t intend to share the files out over the web. Since S3 has no concept of folders, the convention is to just encode the path inside the key (the key can have slashes).

The great news is, that an anonymous coder has done all of the grunt work for you to make the backup scenario work, via a tool called s3sync. Basically, it’s rsync to S3 – you create an initial key to put a root folder under, then run s3sync to copy it over. Since s3sync only copies over the files that are new or have changed, it saves bandwidth (and by extension, cash). Here’s how to run your first sync:

# You find this on the S3 site when you sign up
export AWS_ACCESS_KEY_ID="FILLMEINHERE"
export AWS_SECRET_ACCESS_KEY="OMGSEKRITACCESSKEY"

tar -xzvf s3sync*.tar.gz
cd s3sync
chmod +x *.rb  # Just to make sure
./s3cmd.rb createbucket yourusername-backup   # Only need to do this 1x!

# -(r)ecursive, -(v)erbose –delete(old S3 files that no longer exist)
# Sync your Documents folder to S3
./s3sync.rb -r -v –delete "$HOME/Documents" "yourusername-backup:documents"

Seeing what’s on your S3 account

While you’re setting this up and making test runs, it’s pretty useful to be able to see what’s currently in your S3 account. To do this, there’s a great Java applet called Cockpit by James Murty


Runs in-browser, great for management and verifying the backup worked

Make it Automatic

Now that we know how to do one sync, making it automatic is the most important part – if you have to remember to do it, you’re bound to forget. I put it into a script:

#!/bin/sh

### Make sure to fill in the blanks here!
export AWS_ACCESS_KEY_ID="FILLMEINHERE"
export AWS_SECRET_ACCESS_KEY="OMGSEKRITACCESSKEY"
export BUCKET_ID="paulbetts-backup"
export BACKUP_PATH="/storage"
export BACKUP_KEY="website"

echo "**** Backup start ***"
echo `date`
/root/s3sync/s3sync.rb -r -v –delete "$BACKUP_PATH" "$BUCKET_ID:$BACKUP_KEY"

And then, put that script into a cron job, so that it runs at 4am every morning:

# m h  dom mon dow   command
0 4 * * * /root/storage_backup >> /var/log/s3backup.log

What if I’m using Windows

If you’re using Windows, this approach is going to be an order-of-magnitude more annoying, due to the difficulties that Ruby has with the Windows filesystem (backslashes, ACLs, etc) – while you might be able to get it to work, I can’t recommend it. However, one of the developers from Cloudberry Labs Emailed me about some tools for Windows centered around S3 backup that look pretty promising, especially the potential for easy automation via their PowerShell Snapin.

Written by Paul Betts

May 15th, 2009 at 10:23 pm

Posted in Apple, Linux