Sunday, July 13, 2008

Myosotis Connector: a Fast SQL Proxy for MySQL and PostgreSQL

SQL proxies have been very much in the news lately, especially for open source databases. MySQL Proxy and PG-Pool are just two examples. Here is another proxy you should look at: Myosotis.

Myosotis is a 'native-client' to JDBC proxy for MySQL and PostgreSQL clients. We originally developed it to allow clients to attach to our Java-based middleware clusters without using a JDBC driver. Myosotis parses the native wire protocol request from the client, issues a corresponding JDBC call, and returns the results back to the client. As you can probably infer, it's written in Java. "Myosotis" incidentally is the scientific name for "Forget-Me-Not," a humble but strikingly beautiful flower.

Myosotis is still rather simple but it already has a couple of very interesting features. First, it works for both MySQL and PostgreSQL. That's a good start. Wire protocols are very time-consuming to implement. Another feature is that Myosotis is really fast. This deserves explanation and some proof.

As other people have discovered, proxying is very CPU-intensive. It also involves a lot of concurrency, since a proxy may have to manage hundreds or even thousands of connections. Java is already fast in single threads--after a few runs through method invocations, the JVM has compiled the bytecodes down to native machine code. In addition, Java uses multiple CPUs relatively efficiently. Myosotis uses a thread per connection. Java automatically schedules these on all CPUs and optimizes of memory access in multi-core environment.

We can show Myosotis throughput empirically using Bristlecone, an open source test framework we wrote to measure performance of database clusters. We test proxy throughput by issuing do-nothing queries as quickly as possible with varying numbers of threads. The following run compares Myosotis against a uni/cluster 2007.1 process (a much more complex commercial middleware clustering software) and MySQL Proxy 0.6.1 running without Lua scripts. The proxy test environment is a Dell SC 1425 with 4 cores running CentOS5 and MySQL 5.1.23.

The results are striking. Myosotis gets between 3000 and 3500 queries per second when 8 threads are simultaneously running queries. To demonstrate processor scaling, run htop when the Myosotis Connector is being tested. You see something like this--a nice distribution across 4 cores.
Myosotis is a very simple proxy now but it has the foundation to create something great. We have big plans for Myosotis--it's a key part of our Tungsten architecture for database scale-out, which we will be rolling out later in the summer. The next step is to add routing logic so that we can implement load balancing and failover. We'll be doing that over the next few months. Meanwhile, if you want to see how fast Java proxies for SQL can be, check us out at at http://myosotis.continuent.org.

p.s., If you want to repeat the test shown here on your own proxy, download Bristlecone and try it out. I used the ReadSimpleScenario test, which is specifically designed to check middleware latency.

Wednesday, July 2, 2008

What's Your Favorite Database Replication Feature?

Replication is one of the most flexible technologies available for databases. We are implementing a new open-source, database-neutral replication product that works with MySQL, Oracle, and PostgreSQL. Naturally we've done a lot of thinking about the feature set. It's tough to pick any single feature as the most important, but one that really stands out is optional statement replication. Here's why.

Database replication products tend to replicate row changes and DDL. However, Mark Callaghan has a great example of why you want to replicate statements as well--it enables Maatkit distributed consistency checking to work. If you dissect the mk-table-checksum --replicate command you will see that it uses a nice trick. The SQL queries generate checksums into the master table and then replicate as statements rather than row updates out to slaves. That way the slaves recompute the checksum locally at the same point in the overall transaction history. Very elegant!

Replicated consistency checks are a wonderful feature for large systems that can't afford to stop in order to compare tables between servers. However, you cannot use it if your database cannot replicate statements. As Mark points out, not even all MySQL engines do this. The proposed replication additions for PostgreSQL won't support it either.

Optional statement replication is really the best kind of feature: it is useful on its own, but also enables features like consistency checking and other nice administrative tricks. We're going to put a "worm-hole" in our replication engine that allows applications to invoke statement replication at the SQL level. Can you guess how we are going to do it? If not, you'll have to wait until we release. :)

So what's your favorite database replication feature?

Scaling Databases Using Commodity Hardware and Shared-Nothing Design