Sunday, March 29, 2009

Implementing Relaxed Consistency Database Clusters with Tungsten SQL Router

In December 2007 Werner Vogels posted a blog article entitled Eventual Consistency, since updated with a new article entitled Eventually Consistent - Revisited. In a nutshell it described how to scale databases horizontally across nodes by systematically trading off availability, strict data consistency, and partition resilience as defined by the CAP theorem. According to CAP, you can only have two of three of these properties at any one time. The route to highly available and performant databases, according to Vogels, is eventual consistency in which distributed database contents at some point converge to a single value but at any given time may be inconsistent across replicas. This is the idea behind databases like SimpleDB.

I read the original blog article at about 2am on a Sunday morning. It was like a thunderclap. Like transactions and state machines, CAP was one of those ideas that provide instant clarity to a large class of problems, in this case related to database availability and performance. But it also raised an immediate question: can't we apply CAP systematically on conventional SQL databases? That way you don't have to throw away the relational database baby away with the strict consistency bathwater.

This is not an implausible idea. Most database engines have built-in master/slave replication to at least some degree, so there's no problem distributing data. (Shameless plug: If you don't like what your database provides, try ours.) The real problem is that you need to change how applications access the database. They need to implement CAP trade-offs in a consistent and easily understandable way. That's where the Tungsten SQL Router comes in.

Tungsten SQL Router is a thin Java JDBC driver wrapper that enhances conventional JDBC drivers to implement database session semantics based on CAP. SQL Router adds a "quality of service" or qos to each database session. Being programmers we had to invent our own terms, so here are the initial qos values.
  • RW_STRICT -- This session is used for writes; all data are strictly consistent, i.e., appear to all applications on RW_STRICT sessions as soon as they are written. In CAP terms you are picking data consistency + partition tolerance. (Vogel's article uses the term "causal consistency.")
  • RO_RELAXED -- This session is used for reads; data consistency is relaxed, i.e., represents data at an earlier point in time. In CAP terms you are picking availability + partition tolerance. (Vogel's article uses the term "monotonic reads.")
Clients can request the preferred quality of service whenever they create a new transaction. The SQL Router then takes care of connecting to a database that meets the semantics. Here's a typical Tungsten SQL Router URL (i.e., connection string) that routes connections to a MySQL master database:
jdbc:t-router://myservice/mydb?qos=RW_STRICT&createDatabaseIfNotExist=true
The SQL Router only steps in to select connections and to break them as necessary when databases go offline. It has almost no performance impact on Java applications, because we don't touch result sets and there are no proxy hops. That's an important requirement to achieve maximal application transparency.

Making CAP work properly for conventional applications is not entirely straightforward, which is one of the main reasons why you don't want the logic to be a part of your application. Here are some of the key features that Tungsten SQL Router provides.
  • Distributed database services. SQL Router groups databases into "services." Each database in the service is defined using a simple resource file that defines its name, location, and role (e.g., master or slave).
  • Remote management interfaces. Databases fail or go offline for maintenance and cluster resources change over time. Strict consistency connections in fact explicitly choose to fail when the database is no longer available rather than access old data, so you must handle failover easily. Tungsten SQL Router has a built-in JMX administrative interface that allows you to promote a slave database to become a master, take databases offline, bring them back online, etc., without disturbing or even necessarily notifying applications.
  • Support for non-Java applications. The world is a diverse place and not every application is written in Java. You can embed the SQL Router in the Tungsten Connector, a proxy that allows native MySQL and PostgreSQL applications (Perl, PHP, Ruby, name your favorite...) to connect without library changes or even changing connection strings.
  • Integration with connection pools. SQL Router provides call-backs that can be used to let application connection pools know when to give up applications. A little cooperation here makes things like failover work much more smoothly.
There are other features but it's probably simplest if you visit the Tungsten Project on SourceForge.net, read the wiki documentation, download a copy, and try it out for yourself. There's general information about Tungsten on our community website. Note: our community site is due for an update shortly to add more information about SQL Router and other new projects we are releasing. For the next few days please check out SourceForge.net.

Finally, here's an interesting thought that shows the power of applying CAP semantics in SQL applications. So far we have been talking about database replicas. However, SQL Router relaxed consistency sessions could just as easily read query data from a distributed cache like memcached. An application that specifies qos=RO_RELAXED on a connection is saying it will accept possibly out-of-date data in return for availability. Semantically there is no difference between a cache and a database replica--you can substitute any conforming storage implementation. Exploiting that idea pretty much defines our long-term roadmap for the SQL Router.

In summary SQL Router provides a simple model so that applications can choose whether they want availability or full data consistency while ensuring basic properties like partition resilience. These semantics are key to extending the scale-out database design model to increasingly large clusters, and equally important, to make that model easy to use for clusters of all sizes. Tungsten SQL Router is a work in progress, but the idea of using CAP semantics really seems to have legs. I hope you will try it out and let us know what you think.

p.s., I would like thank David Van Couvering for pointing out Werner Vogel's article in his blog as well as my colleague Ed Archibald for getting the SQL Router off the ground. Nice working with you guys. :)

Tuesday, March 17, 2009

Announcing Tungsten Monitor

Yesterday I posted about our release of Tungsten FSM, a package for building state machines in Java. Today we are publishing Tungsten Monitor, the second of four new Tungsten packages we are releasing on SourceForge.net during the month of March. Tungsten Monitor offers fast, pluggable resource monitoring for database clusters. We have a couple of specific monitor types already implemented; you can add new ones with minimal Java code.

Tungsten Monitor is focused on a single problem: providing continuous notifications of the state of resources in a cluster. Each monitor executes a simple process loop to check the resource and broadcast the resulting status. The status information answers the following questions:
  • What type of resource is this?
  • What is its name?
  • Is the resource up or down?
  • If a replica database, what is the latency compared to a master?
Tungsten Monitor is exceedingly simple--the current version has a total of 13 Java classes. (Who says Java needs to be complex?) Despite this simplicity the monitor has at least three very interesting features.

First, there's no centralized agent framework. You just start a monitor on a replicator, database, etc., and it starts to generate notifications. The off-the-shelf configuration monitors Tungsten Replicator--to get started you unpack code, copy one file, and start. That's it. The monitor figures out everything else automatically.

Second, Tungsten Monitor broadcasts notifications using group communications, which is a protocol for reliable, broadcast messaging between a set of processes known as a "group." Manager processes can tell that a new resource is available because its monitor joins the group and starts to broadcast notifications that the resource is up. If the new resource is a database, this is enough for a smart proxying service to start sending traffic to it automatically without any further management intervention.

Third, you can add new resource checks and notification methods yourself. With just 13 Java classes to start, you obviously won't be getting a lot off the shelf. :) We will add MySQL, PostgreSQL, and Oracle monitors shortly but any competent Java programmer could do the same in an hour or two without waiting for us.

In the meantime, you can download Tungsten Monitor source and binary builds from SourceForge.net. Also, here's a wiki article that describes basic installation and configuration. Take it for a spin, especially if you are using Tungsten Replicator already.

Interested? I hope so. Tungsten Monitor integrates with another of our March projects to route SQL connections automatically to available databases in a master/slave or master/master cluster. Stay tuned for more on that next week. Things are just starting to get interesting around here.

p.s., I did a bit of coding on the monitor but it really belongs to Gillles Rayrat, a colleague of mine in Grenoble. Nice job, Gilles!

Sunday, March 15, 2009

Announcing Tungsten Finite State Machine Library

It is my pleasure to announce publication of the Tungsten Finite State Machine Library (Tungsten FSM) as an open source package hosted on SourceForge.net. This is the first of four new components for database clustering and replication that we will be releasing into open source during the month of March.

Tungsten FSM is a Java library for implementing in-memory state machines. It is lightweight and very simple to use. Each Tungsten FSM state machine tracks the state of a particular instance (i.e., object) in Java. The model provides standard features of state machines including states, transitions, actions, and error handling. Source and binary downloads are available here--there is also wiki documentation that explains how to use Tungsten FSM with code examples.

Here's a little background on the Tungsten FSM library and how it arose. State machines let you model the behavior of complex systems including system states and input/outputs in a simple and understandable way. They are as important for distributed systems as transactions are for databases. Among other things, state machines enable you to ensure that network services behave deterministically when presented with multiple, concurrent inputs. That determinism in turn is a fundamental requirement for organizing groups of processes into database clusters, which is what we do at Continuent.

When we embarked on development of Tungsten Replicator, it was immediately apparent we would need to implement state machines. Most of the available libraries, such JBoss jBPM, were heavyweight or otherwise difficult to embed. We therefore wrote a lightweight library of our own, adding features as we ran into practical implementation issues like dealing with errors. Tungsten FSM helps us organize code in services--for instance, it has been very easy to add new administrative operations to the Tungsten Replicator simply by adding more state transitions.

However, you don't have to take my word for it. Try out Tungsten FSM and let me know how you like it. For more information on Tungsten in general, please visit our community pages.

p.s., I'm posting this article to aggregators for MySQL and PostgreSQL even though it's not directly related to databases per se. State machines turn out to be essential to database clustering and management, as you'll see in some of the succeeding articles on this blog.

Scaling Databases Using Commodity Hardware and Shared-Nothing Design