Dec 26, 2009

Proving Master/Slave Clusters Work and Learning along the Way

2009 has been a big year for Tungsten. In January we had (barely) working replication for MySQL. It had some neat features like global IDs and event filters, but to be frank you needed imagination to see the real value. Since then, Tungsten has grown into a full-blown database clustering solution capable of handling a wide range of user problems. Here are just a few of the features we completed over the course of the year:
  • Autonomic cluster management using business rules to implement auto-discovery of new databases, failover, and quick recovery from failures
  • Built-in broadcast monitoring of databases and replicators
  • Integrated backup and restore operations
  • Pluggable replication management, proven by clustering implementations based on PostgreSQL Warm Standby and Londiste
  • Multiple routing mechanisms to provide seamless failover and load balancing of SQL
  • Last, but not least, simple command line installation to configure and start Tungsten in minutes
You can see the results in our latest release, Tungsten 1.2.1, which comes in both open source and commercial flavors. (See our downloads page to get software as well as documentation.)

In the latter part of 2009 we also worked through our first round of customer deployments, which was an adventure but helped Tungsten grow enormously. Along the way, we confirmed a number of hunches and learned some completely new lessons.
  • Hardware is changing the database game. In particular, performance improvements are shifting clustering in the direction of loosely coupled master/slave replication rather than tightly coupled multi-master approaches. As I laid out in a previous article, the problem space is shifting from database performance to availability, data protection, and utilization.
  • Software as a Service (SaaS) is an important driver for replication technology. Not only is the SaaS sector growing, but even small SaaS applications can result in complex database topologies that need parallel, bi-directional, and cross-site replication, among other features. SaaS business economics tend to drive building these systems on open source databases like MySQL and PostgreSQL. By supporting SaaS, you support many other applications as well.
  • Cluster management is hard but worthwhile. Building distributed management with no single points-of-failure is a challenging problem and probably the place where Tungsten still has the most work to do. Once you get it working, though, it's like magic. We have been focused on trying to make management procedures not just simple but wherever possible to do away with them completely by making the cluster self-managing.
  • Business rules rock. We picked the DROOLS rule engine to help control Tungsten and make it automatically reconfigure itself when data sources appear or fail. The result has been an incredibly flexible system that is easy to diagnose and extend. Just one example: floating IP address support for master databases took 2 hours to implement using a couple of new rules that work alongside the existing rule set. If you are not familiar with rules technology, there is still time to make a New Year's resolution to learn it in 2010. It's powerful stuff.
  • Clustering has to be transparent. I mean really transparent. We were in denial on this subject before we started to work closely with ISPs, where you don't have the luxury of asking people to change code. Tungsten Replicator is now close to a drop-in replacement for MySQL replication as result. We also implemented proxying based on packet inspection rather than translation and re-execution to raise throughput and reduce incompatibilities visible to applications.
  • Ship integrated, easy-to-use solutions. We made the mistake of releasing Tungsten into open source as a set of components that users had to integrate themselves. We have since recanted. As penance we now ship fully integrated clusters with simple installation procedures even for open source editions and are steadily extending the installations to cover not just our own software but also database and network configuration.
Beyond the features and learning experiences the real accomplishment of 2009 was to prove that integrated master/slave clusters can solve a wide range of problems from data protection to HA to performance scaling. In fact, what we have implemented actually works a lot better than I expected when we began to design the system back in 2007. (In case this sounds like a lack of faith, plausible ideas do not not always work in the clustering field.) If you have not tried Tungsten, download it and see if you share my opinion.

Finally, keep watching Tungsten in 2010. We are a long way from running out of ideas for making Tungsten both more capable and easier to use. It's going to be a great year.