Thursday, August 28, 2008

Answering Monty's Challenge: Advanced Replication for MySQL

Today Continuent is publishing the Tungsten Replicator, which provides advanced open source master/slave replication for MySQL. Publishing code is the first step to creating a robust alternative to current MySQL replication and will be followed by similar support for Oracle, PostgreSQL, and many other databases.

We started with master/slave replication on MySQL for a very simple reason: we know it well. And we know that while MySQL replication has many wonderful features like simple set-up, it also has many deficiencies that have persisted for a long time. Monty Widenius, a widely respected MySQL engineer, summarized some of the key problems last April:

- replication is not fail safe
- no synchronous options
- no checking consistency option
- setup and resync of slave is complicated
- single thread on the slave
- no multi-master
- only InnodDB synchronizes with the replication (binary) log

These issues are well-known to the MySQL community. Monty laid down a challenge, but we all know the community can write software that solves it. However, there’s a much bigger challenge out there. There are highly capable replication products produced by commercial vendors like Golden Gate, Quest, Oracle, Sybase, and others. They handle high availability, performance scaling, upgrade, heterogeneous replication, cross site clustering—you name it. Why aren’t these capabilities available in an open source product? Why doesn’t that open source product have the ease-of-use and accessibility MySQL is famous for?

The Tungsten Replicator is designed to answer that challenge. Here’s the initial feature set:
  • Simple set-up procedure
  • Master/slave replication of one, some, or all databases
  • MySQL statement replication
  • Proper handling of master failover in presence of multiple slaves
  • Checksums on replication events
  • Table consistency check mechanism
And here’s the roadmap:
  • Group communications-based management
  • Oracle support
  • PostgreSQL support
  • MySQL row replication
  • Heterogeneous replication
  • Multi-master via bi-directional replication with conflict resolution
  • Semi-synchronous replication
  • Parallel update on slaves to increase performance
  • Proxying support to reduce or eliminate application changes
We are implementing all of these features in a way that abstracts out platform and database differences. The architecture is not just database-neutral--by making it possible to extract from one database type and push to another we lay a cornerstone for heterogeneous data transfer.

Tungsten Replicator is available on our community website at http://community.continuent.com. Stop by and check it out. The code is in the early stages but will mature very rapidly. You can help us guide it forward. We are looking forward to answering Monty’s challenge and going much further. We are looking forward to creating something that brings powerful replication within the reach of every database user.

Wednesday, August 6, 2008

Drizzle is Cool but Confusing

Brian Aker's Drizzle post was the most interesting news to emerge during OSCON 2008. In case you have been on vacation, Drizzle is a stripped down version of MySQL for horizontally scaled web applications and Cloud Computing. Full-blown SQL databases are often overkill here, a point of view espoused by this blog among others.

It's easy to get excited about Drizzle. Brian, Monty, and others define the problem space very clearly and list some intriguing feature ideas on the Drizzle wiki. Just one example: sharding across multiple nodes, which is key to scaling massive reads and writes. From a technical perspective, it sounds cool.

Still, there's a dark side for Sun's database business. In addition to unfinished product versions and storage engines, there have now been at least three announced forks of the MySQL code in the last few months. It is thought-provoking that some of the most respected MySQL engineers inside and outside Sun are working on an alternative to the flagship product. This is the prelude to a classic trap that scuttled Informix among others in the 1990s. Even in the best case enterprise users will find it confusing.

Drizzle illustrates a problem with open source dialectics that has been developing since before the Sun acquisition--there's a big difference between open source to drive technology versus open source to market enterprise products. MySQL is a big tent with multiple products and business models uncomfortably rolled into one. There's no reason not to split them up into separate offerings with appropriate open source models for their respective markets. Other database vendors do this. However, Sun is running out of time to get the marketing right.

Meanwhile, for techies looking at large web applications or for Cloud developers, Drizzle is not confusing at all. It's time to download the code and see what's up. It could be really cool.

Scaling Databases Using Commodity Hardware and Shared-Nothing Design