Monday, November 17, 2008

Announcing Tungsten Replicator Beta for MySQL

Pluggable open source replication has arrived, at least in beta form. Today we are releasing Tungsten Replicator 1.0 Beta-1 with support for MySQL. This release is the next step in bringing advanced data replication capabilities to open source and has many improvements and bug fixes. It also (finally) has complete documentation. I would like to focus on an interesting feature that is fully developed in this build: pluggable replication.

I have blogged about our goals for Tungsten Replicator quite a bit, for instance here and here. We want the Replicator to be platform-independent and database-neutral. We also want it to be as flexible as possible, so that our users can:
  • Support new databases easily
  • Filter and transform SQL events flexibly
  • Replicate between databases and applications, messaging systems, or files that you don't traditionally combine with replication
It was clear from the start we needed to factor the design cleanly. The result was an architecture where the main moving parts are interchangeable plug-ins. Here's a picture:

There are three main types of plug-ins in Tungsten Replicator.
  • Extractors remove data from a source, usually a database.
  • Appliers put the events in a target, usually a database.
  • Filters transform or drop events after extraction or before application.
This sounds pretty simple and it is. But it turns out to be amazingly flexible. I'll just give one example.

Say you are using Memcached to hold pages for a media application. The media database is loaded from a "dumb" 3rd party feed piped in through mysql. Normally you would set up some sort of mechanism within the feed that connects to the database and then updates Memcached accordingly. Okay, that works. However, your feed processor just got a lot more complicated. Now there's a better way. You can write an Applier that converts SQL events from the database to Memcached calls to invalidate corresponding pages. Then you can write a Filter that throws away any SQL events you don't want to see. Voila! Problem solved. Because it works off the database log, this approach works no matter how you load the database. That's even better.

Tungsten Beta has a number of other interesting features beyond pluggable replication. Our next builds will support MySQL row replication fully and have much better heterogeneous replication. I'm going to cover these in future blog posts. Incidentally, MySQL 5.1 row replication is a highly enabling feature for many data integration problems. If you have not checked it out already, I hope our replication will motivate you to do so in the very near future.

Meanwhile, please download load the build and take it out for a spin. Builds, documentation, bug tracking, wikis and much more are available on our community site. Have fun!

8 comments:

Anonymous said...

Robert,

This sounds exciting. I am impressed by your work. As you know, web companies are typically all-MySQL but in the enterprise space we nearly always live side-by-side with Oracle or some other DBMS. Fast and reliable replication is key to both.

All the best in building a huge community and business around Tungsten.

Marten Mickos
SVP Database Group, Sun
(formerly CEO of MySQL AB)

Vasudevan said...

Hi,

Have you done any peformance benchmark? How long it will take to replicate an insert or update or delete takes place in a master to replicate to another master in the same network?

Robert Hodges said...

Hi Marten,

Thanks for the kind words. The heterogeneous use cases look really huge for open source databases--in my view this is how MySQL and other open source databases really get to the next level in enterprises. I'm quite hopeful we'll be able to replicate from MySQL into most JDBC-enabled databases by end of the year.

Cheers, Robert

Robert Hodges said...

Hi Vasudevan,

Yes, I have done some very basic benchmarking on the alpha. I was able to hit 1300 inserts/second using one of our standard tests from Bristlecone (http://www.continuent.com/community/bristlecone). I expect we'll be able to reach at least 2500 inserts/second on a basic 4-core intel box by the time the first release goes production. This is comparable to figures for PostgreSQL SLONY, which shares a number of similarities in design. So, not as fast as native MySQL for now but plenty fast for most applications.

Cheers, Robert

Vasudevan said...

Hi,

Thanks for the quick update.

How many milliseconds it takes to see the insert or update done in one master to replicate in another master?

For example, with in how many milliseconds I can see the 1300 records inserted/seconds in Master A in Master B?

with regds,
Vasu

Robert Hodges said...

Hi Vasu,

Here's how the test works. We insert each into the master and then check to see when it arrives on the slave. That defines the transaction time. When I say 1300 per second, that means we fully replicated that number of updates from master to slave in each second.

However, master/slave performance is more complex than this. Like most implementations including MySQL, our slave updates are currently single threaded This means that a slow DDL statement will hold up later updates that would otherwise process quite quickly. We implemented row replication right from the start because it gives us a leg up on implementing parallel updates, which are the key to reducing slave latency.

Robert

Vasudevan said...

I have three more queries.

1. Is it synchronous or asynchronous?
2. Is it unidirectional or bidirectional?
3. If it is bidirectional, does it has any performance bottlenecks?

with regds,
Vasu

Robert Hodges said...

Hi Vasu,

1.) It's asynchronous. (For synchronous have a look at Sequoia.)

2.) It's uni-directional.

3.) You can make it bi-directional by running replicators in both directions but there's no special support for it. We will add features specifically for bi-directional replication in a later release. It's a problem that has interested us for a long time. If you have suggested feature, please post them in the forums or treplicator mailing list.

For more questions, check out the documentation referenced in the blog article.

Cheers, Robert

Scaling Databases Using Commodity Hardware and Shared-Nothing Design