Showing posts with label Proxies. Show all posts
Showing posts with label Proxies. Show all posts

Jun 17, 2009

Lots of New Tungsten Builds--Get 'Em While They're Hot

There is a raft of new Tungsten open source builds available for your replication and clustering pleasure. Over the last couple of days we uploaded new binary builds for Tungsten Replicator, Tungsten Connector, Tungsten Monitor, and Tungsten SQL Router. These contain the features described in my previous blog article, including even more bug fixes (36 on Tungsten Replicator alone) than I had expected as we had a debugging fest over the last few days that knocked off a bunch of issues. You can pick up the builds on the Tungsten download page. Docs are posted on the Tungsten wiki.

If you have questions, see problems with the builds, or just want to tell us how great they are, please post on the community forums or on the tungsten-discuss mailing list.

Our next open source release will be the Tungsten Manager, which is long overdue to join the family of regular builds. We are doing some polishing work on the state machine processing and group communications, after which the Manager will go out along with documentation on how to use it.

Apr 24, 2009

MySQL Conference Impressions and Slides

"Interesting" was probably the most overused word at the MySQL Conference that just ended yesterday. Everyone is waiting to find out more about the Oracle acquisition of Sun. As a community we need to find some synonyms or things will become very tiresome. Personally I vote for intriguing.

Here are slides for my presentations at the MySQL Conference as well as the parallel Percona Performance is Everything Conference. Thanks to everyone to attended as well as to the organizers. You had wonderful ideas and suggestions.


Finally, some short impressions on the conference. The two most intriguing trends were advances in hardware, especially memory and SSDs, as well as clouds. These are altering the economics of computer in fundamental ways: business costs as well as performance trade-offs in many of the basic algorithms for data management. Combined with the ferment of projects spinning off from MySQL and others, they are fueling an incredible burst of creative thinking about databases.

By comparison, Oracle consuming Sun is merely interesting.

Mar 29, 2009

Implementing Relaxed Consistency Database Clusters with Tungsten SQL Router

In December 2007 Werner Vogels posted a blog article entitled Eventual Consistency, since updated with a new article entitled Eventually Consistent - Revisited. In a nutshell it described how to scale databases horizontally across nodes by systematically trading off availability, strict data consistency, and partition resilience as defined by the CAP theorem. According to CAP, you can only have two of three of these properties at any one time. The route to highly available and performant databases, according to Vogels, is eventual consistency in which distributed database contents at some point converge to a single value but at any given time may be inconsistent across replicas. This is the idea behind databases like SimpleDB.

I read the original blog article at about 2am on a Sunday morning. It was like a thunderclap. Like transactions and state machines, CAP was one of those ideas that provide instant clarity to a large class of problems, in this case related to database availability and performance. But it also raised an immediate question: can't we apply CAP systematically on conventional SQL databases? That way you don't have to throw away the relational database baby away with the strict consistency bathwater.

This is not an implausible idea. Most database engines have built-in master/slave replication to at least some degree, so there's no problem distributing data. (Shameless plug: If you don't like what your database provides, try ours.) The real problem is that you need to change how applications access the database. They need to implement CAP trade-offs in a consistent and easily understandable way. That's where the Tungsten SQL Router comes in.

Tungsten SQL Router is a thin Java JDBC driver wrapper that enhances conventional JDBC drivers to implement database session semantics based on CAP. SQL Router adds a "quality of service" or qos to each database session. Being programmers we had to invent our own terms, so here are the initial qos values.
  • RW_STRICT -- This session is used for writes; all data are strictly consistent, i.e., appear to all applications on RW_STRICT sessions as soon as they are written. In CAP terms you are picking data consistency + partition tolerance. (Vogel's article uses the term "causal consistency.")
  • RO_RELAXED -- This session is used for reads; data consistency is relaxed, i.e., represents data at an earlier point in time. In CAP terms you are picking availability + partition tolerance. (Vogel's article uses the term "monotonic reads.")
Clients can request the preferred quality of service whenever they create a new transaction. The SQL Router then takes care of connecting to a database that meets the semantics. Here's a typical Tungsten SQL Router URL (i.e., connection string) that routes connections to a MySQL master database:
jdbc:t-router://myservice/mydb?qos=RW_STRICT&createDatabaseIfNotExist=true
The SQL Router only steps in to select connections and to break them as necessary when databases go offline. It has almost no performance impact on Java applications, because we don't touch result sets and there are no proxy hops. That's an important requirement to achieve maximal application transparency.

Making CAP work properly for conventional applications is not entirely straightforward, which is one of the main reasons why you don't want the logic to be a part of your application. Here are some of the key features that Tungsten SQL Router provides.
  • Distributed database services. SQL Router groups databases into "services." Each database in the service is defined using a simple resource file that defines its name, location, and role (e.g., master or slave).
  • Remote management interfaces. Databases fail or go offline for maintenance and cluster resources change over time. Strict consistency connections in fact explicitly choose to fail when the database is no longer available rather than access old data, so you must handle failover easily. Tungsten SQL Router has a built-in JMX administrative interface that allows you to promote a slave database to become a master, take databases offline, bring them back online, etc., without disturbing or even necessarily notifying applications.
  • Support for non-Java applications. The world is a diverse place and not every application is written in Java. You can embed the SQL Router in the Tungsten Connector, a proxy that allows native MySQL and PostgreSQL applications (Perl, PHP, Ruby, name your favorite...) to connect without library changes or even changing connection strings.
  • Integration with connection pools. SQL Router provides call-backs that can be used to let application connection pools know when to give up applications. A little cooperation here makes things like failover work much more smoothly.
There are other features but it's probably simplest if you visit the Tungsten Project on SourceForge.net, read the wiki documentation, download a copy, and try it out for yourself. There's general information about Tungsten on our community website. Note: our community site is due for an update shortly to add more information about SQL Router and other new projects we are releasing. For the next few days please check out SourceForge.net.

Finally, here's an interesting thought that shows the power of applying CAP semantics in SQL applications. So far we have been talking about database replicas. However, SQL Router relaxed consistency sessions could just as easily read query data from a distributed cache like memcached. An application that specifies qos=RO_RELAXED on a connection is saying it will accept possibly out-of-date data in return for availability. Semantically there is no difference between a cache and a database replica--you can substitute any conforming storage implementation. Exploiting that idea pretty much defines our long-term roadmap for the SQL Router.

In summary SQL Router provides a simple model so that applications can choose whether they want availability or full data consistency while ensuring basic properties like partition resilience. These semantics are key to extending the scale-out database design model to increasingly large clusters, and equally important, to make that model easy to use for clusters of all sizes. Tungsten SQL Router is a work in progress, but the idea of using CAP semantics really seems to have legs. I hope you will try it out and let us know what you think.

p.s., I would like thank David Van Couvering for pointing out Werner Vogel's article in his blog as well as my colleague Ed Archibald for getting the SQL Router off the ground. Nice working with you guys. :)

Sep 14, 2008

Java Service Wrapper Is *Very* Handy

If you write network services using Java, you should look into the Java Service Wrapper (JSW). The JSW turns Java programs from weak delicate creatures easily killed by an errant Ctrl-C into robust network services that boot up automatically, ignore most signals, and restart automatically following crashes. It's free for open source programs and has very reasonable licensing fees for commercial software.

We use JSW on several of our projects including the Tungsten Replicator and the Tungsten Connector. I just checked in a new project on our Tungsten Commons site with an Ant script that automatically copies the open source versions of JSW into a project directory with a conventional layout including bin and lib directories. Check it out here if you would like an example of how to automate addition of JSW wrappers to your own Java projects.

Jul 13, 2008

Myosotis Connector: a Fast SQL Proxy for MySQL and PostgreSQL

SQL proxies have been very much in the news lately, especially for open source databases. MySQL Proxy and PG-Pool are just two examples. Here is another proxy you should look at: Myosotis.

Myosotis is a 'native-client' to JDBC proxy for MySQL and PostgreSQL clients. We originally developed it to allow clients to attach to our Java-based middleware clusters without using a JDBC driver. Myosotis parses the native wire protocol request from the client, issues a corresponding JDBC call, and returns the results back to the client. As you can probably infer, it's written in Java. "Myosotis" incidentally is the scientific name for "Forget-Me-Not," a humble but strikingly beautiful flower.

Myosotis is still rather simple but it already has a couple of very interesting features. First, it works for both MySQL and PostgreSQL. That's a good start. Wire protocols are very time-consuming to implement. Another feature is that Myosotis is really fast. This deserves explanation and some proof.

As other people have discovered, proxying is very CPU-intensive. It also involves a lot of concurrency, since a proxy may have to manage hundreds or even thousands of connections. Java is already fast in single threads--after a few runs through method invocations, the JVM has compiled the bytecodes down to native machine code. In addition, Java uses multiple CPUs relatively efficiently. Myosotis uses a thread per connection. Java automatically schedules these on all CPUs and optimizes of memory access in multi-core environment.

We can show Myosotis throughput empirically using Bristlecone, an open source test framework we wrote to measure performance of database clusters. We test proxy throughput by issuing do-nothing queries as quickly as possible with varying numbers of threads. The following run compares Myosotis against a uni/cluster 2007.1 process (a much more complex commercial middleware clustering software) and MySQL Proxy 0.6.1 running without Lua scripts. The proxy test environment is a Dell SC 1425 with 4 cores running CentOS5 and MySQL 5.1.23.

The results are striking. Myosotis gets between 3000 and 3500 queries per second when 8 threads are simultaneously running queries. To demonstrate processor scaling, run htop when the Myosotis Connector is being tested. You see something like this--a nice distribution across 4 cores.
Myosotis is a very simple proxy now but it has the foundation to create something great. We have big plans for Myosotis--it's a key part of our Tungsten architecture for database scale-out, which we will be rolling out later in the summer. The next step is to add routing logic so that we can implement load balancing and failover. We'll be doing that over the next few months. Meanwhile, if you want to see how fast Java proxies for SQL can be, check us out at at http://myosotis.continuent.org.

p.s., If you want to repeat the test shown here on your own proxy, download Bristlecone and try it out. I used the ReadSimpleScenario test, which is specifically designed to check middleware latency.