Thursday, August 28, 2008

Answering Monty's Challenge: Advanced Replication for MySQL

Today Continuent is publishing the Tungsten Replicator, which provides advanced open source master/slave replication for MySQL. Publishing code is the first step to creating a robust alternative to current MySQL replication and will be followed by similar support for Oracle, PostgreSQL, and many other databases.

We started with master/slave replication on MySQL for a very simple reason: we know it well. And we know that while MySQL replication has many wonderful features like simple set-up, it also has many deficiencies that have persisted for a long time. Monty Widenius, a widely respected MySQL engineer, summarized some of the key problems last April:

- replication is not fail safe
- no synchronous options
- no checking consistency option
- setup and resync of slave is complicated
- single thread on the slave
- no multi-master
- only InnodDB synchronizes with the replication (binary) log

These issues are well-known to the MySQL community. Monty laid down a challenge, but we all know the community can write software that solves it. However, there’s a much bigger challenge out there. There are highly capable replication products produced by commercial vendors like Golden Gate, Quest, Oracle, Sybase, and others. They handle high availability, performance scaling, upgrade, heterogeneous replication, cross site clustering—you name it. Why aren’t these capabilities available in an open source product? Why doesn’t that open source product have the ease-of-use and accessibility MySQL is famous for?

The Tungsten Replicator is designed to answer that challenge. Here’s the initial feature set:
  • Simple set-up procedure
  • Master/slave replication of one, some, or all databases
  • MySQL statement replication
  • Proper handling of master failover in presence of multiple slaves
  • Checksums on replication events
  • Table consistency check mechanism
And here’s the roadmap:
  • Group communications-based management
  • Oracle support
  • PostgreSQL support
  • MySQL row replication
  • Heterogeneous replication
  • Multi-master via bi-directional replication with conflict resolution
  • Semi-synchronous replication
  • Parallel update on slaves to increase performance
  • Proxying support to reduce or eliminate application changes
We are implementing all of these features in a way that abstracts out platform and database differences. The architecture is not just database-neutral--by making it possible to extract from one database type and push to another we lay a cornerstone for heterogeneous data transfer.

Tungsten Replicator is available on our community website at http://community.continuent.com. Stop by and check it out. The code is in the early stages but will mature very rapidly. You can help us guide it forward. We are looking forward to answering Monty’s challenge and going much further. We are looking forward to creating something that brings powerful replication within the reach of every database user.

11 comments:

Ronald Bradford said...

It's a pity I missed your webinar, but I'll be reviewing this in more detail.

Most interesting.

Ronald
http://ronaldbradford.com

Robert Hodges said...

Hi Ron,

Don't worry, we'll probably be talking about this until everyone is heartily sick of it. :) Seriously, thanks for checking it out. We're really interested in community opinion on this effort.

Robert

Mark Callaghan said...

How does this capture events on the MySQL master? From the architecture document, I think this tails the binlog on the master.

Robert Hodges said...

That's correct--we tail the binlog, assigning transaction IDs, and place the result in the transaction history log (THL), currently stored in the database. This addresses the problem of having fungible transaction IDs but for now inherits two-phase commit issues.

Anonymous said...

And how about SQLite support? It's very fast and quality database management system. There are a lot of installations of SQLite and relevant software.

Robert Hodges said...

That's a really interesting question about SQLite. I'm looking into whether we can support it. Thanks for posting!

burtonator said...

I couple of feature suggestions from the problems we've run into at Spinn3r:

1. make it fully crash safe, optionally using transactions to store metadata on transactional storage engines.

You'll probably have to replace your use of the binlogs as this is one of the problems with the current replication setup.

The current MySQL replicator is not crash safe.

2. Checksums (which you already have)

.... some more thoughts.

The parallel update on slave feature ROCKS and will probably push me to try it out.

Robert Hodges said...

Thanks for the suggestions! I'm already thinking about how to get off binlogs but for now we are using them initially because they are easy to read. Other databases like Oracle and PostgreSQL don't have the consistency issues but are far harder to parse.

Parallel update will be more practical once we have row replication. Currently we are reading statements. This will change soon.

Please sign up on the Tungsten Replicator mailing list if you want to keep track of development.

burtonator said...

Oh..... the other thing I was going to suggest was to use O_DIRECT to write your own binary logs (if you can).

This way the Linux swap bug doesn't bite you and you're not constantly causing Linux to page.

If you need it block aligned you can just write mod 512 byte units.

This turns out to be a big problem in our setup.

Robert Hodges said...

@burtonator
Hi Kevin, your experiences sound pretty interesting. Thanks again for the posts.

ms said...

Hopefully there will be plans for MS SQL Server support. I think there's a lot of opportunity there.

Scaling Databases Using Commodity Hardware and Shared-Nothing Design