I have blogged about our goals for Tungsten Replicator quite a bit, for instance here and here. We want the Replicator to be platform-independent and database-neutral. We also want it to be as flexible as possible, so that our users can:
- Support new databases easily
- Filter and transform SQL events flexibly
- Replicate between databases and applications, messaging systems, or files that you don't traditionally combine with replication
There are three main types of plug-ins in Tungsten Replicator.
- Extractors remove data from a source, usually a database.
- Appliers put the events in a target, usually a database.
- Filters transform or drop events after extraction or before application.
Say you are using Memcached to hold pages for a media application. The media database is loaded from a "dumb" 3rd party feed piped in through mysql. Normally you would set up some sort of mechanism within the feed that connects to the database and then updates Memcached accordingly. Okay, that works. However, your feed processor just got a lot more complicated. Now there's a better way. You can write an Applier that converts SQL events from the database to Memcached calls to invalidate corresponding pages. Then you can write a Filter that throws away any SQL events you don't want to see. Voila! Problem solved. Because it works off the database log, this approach works no matter how you load the database. That's even better.
Tungsten Beta has a number of other interesting features beyond pluggable replication. Our next builds will support MySQL row replication fully and have much better heterogeneous replication. I'm going to cover these in future blog posts. Incidentally, MySQL 5.1 row replication is a highly enabling feature for many data integration problems. If you have not checked it out already, I hope our replication will motivate you to do so in the very near future.
Meanwhile, please download load the build and take it out for a spin. Builds, documentation, bug tracking, wikis and much more are available on our community site. Have fun!
8 comments:
Robert,
This sounds exciting. I am impressed by your work. As you know, web companies are typically all-MySQL but in the enterprise space we nearly always live side-by-side with Oracle or some other DBMS. Fast and reliable replication is key to both.
All the best in building a huge community and business around Tungsten.
Marten Mickos
SVP Database Group, Sun
(formerly CEO of MySQL AB)
Hi,
Have you done any peformance benchmark? How long it will take to replicate an insert or update or delete takes place in a master to replicate to another master in the same network?
Hi Marten,
Thanks for the kind words. The heterogeneous use cases look really huge for open source databases--in my view this is how MySQL and other open source databases really get to the next level in enterprises. I'm quite hopeful we'll be able to replicate from MySQL into most JDBC-enabled databases by end of the year.
Cheers, Robert
Hi Vasudevan,
Yes, I have done some very basic benchmarking on the alpha. I was able to hit 1300 inserts/second using one of our standard tests from Bristlecone (http://www.continuent.com/community/bristlecone). I expect we'll be able to reach at least 2500 inserts/second on a basic 4-core intel box by the time the first release goes production. This is comparable to figures for PostgreSQL SLONY, which shares a number of similarities in design. So, not as fast as native MySQL for now but plenty fast for most applications.
Cheers, Robert
Hi,
Thanks for the quick update.
How many milliseconds it takes to see the insert or update done in one master to replicate in another master?
For example, with in how many milliseconds I can see the 1300 records inserted/seconds in Master A in Master B?
with regds,
Vasu
Hi Vasu,
Here's how the test works. We insert each into the master and then check to see when it arrives on the slave. That defines the transaction time. When I say 1300 per second, that means we fully replicated that number of updates from master to slave in each second.
However, master/slave performance is more complex than this. Like most implementations including MySQL, our slave updates are currently single threaded This means that a slow DDL statement will hold up later updates that would otherwise process quite quickly. We implemented row replication right from the start because it gives us a leg up on implementing parallel updates, which are the key to reducing slave latency.
Robert
I have three more queries.
1. Is it synchronous or asynchronous?
2. Is it unidirectional or bidirectional?
3. If it is bidirectional, does it has any performance bottlenecks?
with regds,
Vasu
Hi Vasu,
1.) It's asynchronous. (For synchronous have a look at Sequoia.)
2.) It's uni-directional.
3.) You can make it bi-directional by running replicators in both directions but there's no special support for it. We will add features specifically for bi-directional replication in a later release. It's a problem that has interested us for a long time. If you have suggested feature, please post them in the forums or treplicator mailing list.
For more questions, check out the documentation referenced in the blog article.
Cheers, Robert
Post a Comment