Monday, October 22, 2007

My First SoCal Fire

Southern California, as you may know, is on fire. Now I'm fortunate that I haven't had to evacuate. And since the closest fire to me has been 100% contained, it's unlikely that I will need to evacuate in the future. Some of my co-workers weren't so lucky.

Even though there's no fire, there's still a ton of smoke. Around where I live it's not too bad. It's a bit smoky and hazy, but I don't really notice it indoors. The San Diego skyline yesterday, on the other hand, was like that of a post-apocalyptic war zone. The combination of the smoke and sun made the sky a bright yellow, and the haze reminded me of Valley fog.

Anyway, during the time I was unsure whether or not I needed to evacuate, I considered the flaws in my computer backup strategy, which is fairly comprehensive and born from being bit in the ass by data loss too many times. My home directory, digital photos, and emails are all stored in a Subversion repository (this saved my ass in college when I accidentally deleted the directory containing my senior project). This has two benefits. First, it ensures my laptop is backed up. Second, it keeps my home directories in sync. Every night, a backup script mounts a USB drive connected to my computer and dumps the subversion repository to it.

Immediately you see the problem, just as I did and have for some time. What happens if my apartment catches on fire? Or someone breaks in and steals both the computer and USB drive? Everything's gone in one fell swoop.

My original intent when I bought the USB drive was to go to the bank every week, take the drive out of my safe deposit box, perform a manual backup, then return the drive to my safe deposit box. Problem is, I'm lazy. Rule number 1 of backing things up is if it isn't automated, you didn't back it up. I'm not going to the bank once a week; I have better things to do. So it needs to be automated.

I've been looking at Amazon S3 for quite some time and think I'm ready to give it a shot. I've been concerned about how I would protect everything, but the final pieces of the puzzle have came into place. I'm going to describe my proposed new backup strategy below.

Incremental subversion dumps will made every night as they always do. They will be encrypted using public key encryption and be stored on both my USB drive and uploaded to S3. The public key will reside on the backup server and will encrypt the backup files. The private key will be be broken up into three pieces using Shamir's method for sharing secrets. One piece will reside on a USB stick in the safe in my apartment, one piece will reside on a USB stick in my safe deposit box, and one piece will reside on S3. Two of the three pieces will be required to reconstruct the private key, allowing either the bank, my apartment, or the S3 network to be destroyed while still requiring that someone have access to two of those pieces before they have the full private key. I'll also be putting paper copies of the USB sticks into both my safe and my safe deposit box just in case the USB drives die.

One thing I'm not quite clear on is how I'm going to validate that the files in S3 haven't been tampered with. I guess I combination of S3 permissions and not giving anyone the public key will do the trick. I'd considered signing them, but the end result is the same as just encrypting the files with a public key not being made public: the scheme depends on a private file no one else has.

Another question is how I'm going to verify the backup completes successfully each night. After all, rule number 2 of backing things up is if you didn't verify the backup works, you didn't back it up. The private key won't be on the computer being backed up (for obvious reasons), so I can't do an automated test (unless I want to do a dry run with a dummy public/private key pair, but that doesn't verify the production backups). I want to say I'll test the restore process every now and then, but I won't (see rule number 1). For now, it'll just have to be good enough.

Comments or suggestions on how to improve this scheme are welcome. In particular, if you want to solve the bottom two issues for me, I'm all ears.


Post a Comment

<< Home