Monday, July 20, 2009

Backups, Backups, Backups (and Restore)

Tungsten Replicator has built-in backup and restore for MySQL! I checked in the final touches over the weekend. Here's how to run a backup on a database and store it so you can restore it later. If you leave off options, we use the default back-up procedure and storage that you select when setting up replication.
trepctl backup [-backup agent] [-storage agent] [-limit timeout]
And here's how to restore. If you leave off the options, we find the latest backup in your default storage and load that.
trepctl restore [-uri backup_uri] [-limit timeout]
That's the syntax. Now here's what happens behind the scenes. First Tungsten Replicator has a new BackupAgent plug-in that implements backup procedures. We have a backup agent for each the following types of backups for MySQL:
  • Mysqldump - Probably least useful but easy to set up.
  • LVM snapshot to tar.gz - Scalable with miminal database down time. Features are similar to Lenz Grimmer's excellent mylvmbackup script.
  • Script dump - Integrate your own script for backup and restore. The script has to follow some very simple conventions, for which there is an example. You can integrate practically any backup/restore package this way.
Speaking as an unbiased user of the system I love the LVM snapshots. LVM snapshot is overall the most convenient way to do backups for a wide range of databases, not just MySQL, though I have to admit I have not used either InnoDB Hot Backup or Percona's XtraBackup. I guess now would be a good time to try them since we can integrate them through the script dump mechanism.

Meanwhile, mysqldump is far and away my least favorite mechanism, not least of all because of this unusually heinous bug, which completely breaks the mysqldump --all-databases command. It's still not fixed as of at least MySQL 5.1.34. (How on earth did this one get in and why didn't it get corrected instantly?)

Second, there is a new StorageAgent plug-in to handle storing and retrieving backup files. There is one of these for each type of storage. Currently the choice is limited to shared disk but I expect we'll have an Amazon S3 storage plug-in in the near feature. That's just too useful to pass up for very long...Among other things we ourselves run all our company services on Amazon and I would like to use it for our own backups.

If you want to use the new backup capability you can either build Tungsten Replicator yourself using the instructions on our getting started page or wait until the next binary build comes out in a couple of weeks. Backup and restore are documented here in the Tungsten documentation.

We went through a lot of effort to make the backup and restore processes as simple as possible. It's down to one keyword for each operation, so I don't think it's going to get much simpler. Please try it out and provide your feedback. I love bug reports and want to hear what you think.

Scaling Databases Using Commodity Hardware and Shared-Nothing Design