The Scale-Out Blog

Mar 28, 2010

New Tungsten Software Releases for MySQL and PostgreSQL

I would like to announce a couple of new Tungsten versions available for your database clustering enjoyment. As most readers of this blog are aware, Tungsten allows users to create highly available data services that include replicated copies, distributed management, and application connectivity using unaltered open source databases. We are continually improving the software and have a raft of new features coming out this year.

First, there is a new Tungsten 1.2.3 maintenance release available in both commercial as well as open source editions. You can get access to the commercial version on the Continuent website, while the open source version is available on SourceForge.

The Tungsten 1.2.3 release focuses on improvements for MySQL users including the following:

Transparent session consistency for multi-tenant applications. This allows applications that follow some simple conventions like sharding tenant data by database to get automatic read scaling to slaves without making code changes.
A greatly improved script for purging history on Tungsten Replicator.
Fixes to binlog extraction to handle enum and set data types correctly.

By far the biggest improvement in this release is Tungsten product documentation, including major rewrites for the guides covering management and connectivity. Even the Release Notes are better. If you want to find out how Tungsten works, start with the new Tungsten Concepts and Administration Guide.

Second, there's a new Tungsten 1.3 release coming out soon. Commercial versions are already in use at selected customer sites, and you can build the open source version by downloading code from SVN on SourceForge.

The Tungsten 1.3 release sports major feature additions in the following areas:

A new replicator architecture that allows you to manage non-Tungsten replication and also to configure very flexible replication flows to use multi-core systems more effectively and implement complex replication topologies. The core processing loop for replication can now cycle through 700,000 events per second on my laptop--it's really quick.
Much improved support for PostgreSQL warm standby clustering as well as provisional management of new PostgreSQL 9 features like streaming replication and hot standby.
Replication support for just about everything in the MySQL binlog: large transactions, unsigned characters, session variables, various permutations of character sets and binary data, and ability to download binlog files through the MySQL client protocol. If you can put it in the binlog we can replicate it.

We also have provisional support for Drizzle thanks to Markus Ericsson, plus a raft of other improvements. This has been a huge amount of work all around, so I hope you'll enjoy the results.

P.s., Contact Continuent if you want to be a beta test site for Tungsten 1.3.

Mar 22, 2010

Replication and More Replication at 2010 MySQL Conference

Database replication is still interesting after all these years. Two of my talks focused on replication technology were accepted for the upcoming MySQL 2010 Conference. Here are the summaries.

Clustering for the Masses - A Gentle Introduction to Tungsten for MySQL

Not Your Grandpa’s Replication-The New Wave of MySQL Replication and How It Helps Your Applications

The first talk is a solo presentation covering Tungsten, which creates highly available and scalable database clusters using vanilla MySQL databases linked by flexible replication. I'll describe how it works and some cool things you can do like zero-downtime upgrades and session-based performance scaling. If you want to know how Tungsten can help you, this is a good time to find out.

The second talk is a joint effort with Jay Pipes covering issues like big data that are driving replication technology and the solutions to these problems available to MySQL users. We'll lay out our vision of where things are going to try to help you pick the right technology for your next project. Jay and I are also soliciting input on this talk from the Drizzle community among others. If you are interested check out the thread on drizzle-discuss or post to this blog.

Finally, I'll be around for much of the MySQL conference, so if you are interested in Tungsten or data replication in general or just want to hang out, please look me up. See you in Santa Clara!

Tungsten and PostgreSQL 9 at PG-East Conference

My Continuent colleagues Linas Virbalas and Alex Alexander will be giving a talk entitled Building Tungsten Clusters with PostgreSQL Hot Standby and Streaming Replication later this week at the PG-East Conference in Philadelphia. I saw the demo last week and it's quite impressive. You can flip the master and slaves for maintenance, open slaves for reads, failover automatically, etc. It's definitely worth attending if you are in Philly this week.

Looking beyond the conference, we plan to be ready to support Tungsten clusters on PostgreSQL 9 as soon as it goes production. Everything we have seen so far indicates that the new log streaming and hot standby features are going to be real hits. They not only help applications, but from a clustering perspective queryable slaves with minimal replication lag are also a lot easier to manage. Alex and Linas will have more to say about that during their presentation.

Meanwhile, I'm sorry to miss the PG-East conference but wish everyone who will be attending a great time. See you later this year at PG-West!

Jan 28, 2010

MariaDB is Thinking about Fixing MySQL Replication and You Can Help

In case you have not noticed, MariaDB is joining the list of projects thinking about how to improve MySQL replication. The discussion thread starts here on the maria-developers mailing list.

This discussion was jointly started by Monty Program, Codership, and Continuent (my employer) in an effort to push the state of the art beyond features offered by the current MySQL replication. Now that things are starting to die down with the Oracle acquisition, we can get back to the job of making the MySQL code base substantially better. The first step in that effort is to get a discussion going to develop our understanding of the replication problems we think are most important and outline a strategy to solve them.

Speaking as a developer on Tungsten, my current preference would to be to improve the existing MySQL replication. I suspect this would also be the preference of most current MySQL users. However, there are also more radical approaches on the table, for example from our friends at Codership, who are developing an innovative form of multi-master replication based on group communications and transaction certification. That's a good thing, as we want a range of contrasting ideas that take full advantage of the creativity in the community on the topic of replication.

If you have interest in improving MySQL replication please join the MariaDB project and contribute your thoughts. It should be an interesting conversation.

Jan 27, 2010

Tungsten 1.2.2 Release is Out - Faster, More Stable, More Fun

Release 1.2.2 of Tungsten Clustering is available on SourceForge as well as through the Continuent website. The release contains mostly bug fixes in the open source version but there are also two very significant improvements of interest to all users.

The manager and monitoring capabilities of Tungsten are completely integrated on the same group communications channel. This fixes a number of problems that caused data sources not to show up properly in older versions.
We are officially supporting a new Tungsten Connector capability for MySQL called pass-through mode, which allows us to proxy connections by transferring network blocks directly rather than translating native request protocol to JDBC calls. Our tests show that it speeds up throughput by as much as 200% in some cases.

The commercial version has additional features like PostgreSQL warm standby clustering, add-on rules to manage master virtual IP addresses and other niceties. If you are serious about replication and clustering it is worth a look.

This is a good time to give a couple of reminders for Tungsten users. First, Tungsten is distributed as a single build that integrates replication, management, monitoring, and connectivity. The old Tungsten Replicator and Myosotis builds are going away. Second, we have a single set of docs on the Continuent website that covers both open source and commercial distributions.

With that, enjoy the new release. If you are using the open source edition, please post your experiences in the Tungsten community forums or write a blog article. We would love to hear from you.

P.s., We have added Drizzle support thanks to a patch from Marcus Eriksson but it's not in 1.2.2. For that you need to build directly from the SVN trunk. Drizzle support will be out in binary builds as part of Tungsten version 1.3.

Jan 17, 2010

What's in Your Binlog?

Over the last couple of months I have run into a number of replication problems where I needed to run reports on MySQL binlogs to understand what sort of updates servers were processing as well as to compute peak and average throughput. It seems that not even Maatkit has a simple tool to report on binlog contents, so I wrote a quick Perl script called analyze-binlog.pl to summary output from mysqlbinlog, which is the standard tool to dump binlogs to text.

The script operates as a filter with the following syntax:

Usage: ./binlog-analyze.pl [-h] [-q] [-v]
Options:
  -h : Print help
  -q : Suppress excess output
  -v : Print verbosely for debugging

To get a report, you just run mysqlbinlog on a binlog file and pipe the results into analyze-binlog.pl. Here is typical invocation and output. The -q option keeps the output as short as possible.

$ mysqlbinlog /var/lib/mysql/mysql-bin.001430 | ./binlog-analyze.pl -q
===================================
| SUMMARY INFORMATION             |
===================================
Server Version    : 5.0.89
Binlog Version    : 4
Duration          : 1:03:37 (3817s)

===================================
| SUMMARY STATISTICS              |
===================================
Lines Read        :        17212685
Events            :         3106006
Bytes             :      1073741952
Queries           :         2235077
Xacts             :          817575
Max. Events/Second:            5871.00
Max. Bytes/Second :         1990077.00
Max. Event Bytes  :          524339
Avg. Events/Second:             813.73
Avg. Bytes/Second :          281305.20
Avg. Queries/Sec. :             585.56
Avg. Xacts/Sec.   :             214.19
Max. Events Time  :         9:01:02

===================================
| EVENT COUNTS                    |
===================================
Execute_load_query   :           10
Intvar               :        53160
Query                :      2235077
Rotate               :            1
Start                :            1
User_var             :          182
Xid                  :       817575

===================================
| SQL STATEMENT COUNTS            |
===================================
begin                :       817585
create temp table    :            0
delete               :        31781
insert               :           20
insert into          :       411266
select into          :            0
update               :       633857

There are lots of things to see in the report, so here are a few examples. For one thing, peak update rates generate 5871 events and close to 2Mb of log output per second. That's loaded but not enormously so--MySQL replication can easily dump over 10,000 events per second into the binlog using workhorse 4-core machines. The application(s) connected to the database execute a large number of fast, short transactions--typical of data logging operations, for example storing session data. We can also see from the Execute_load_query events that somebody executed MySQL LOAD DATA INFILE commands. That's interesting to me because we are just putting them into Tungsten and need to look out for them in user databases.

To interprete the binlog report most effectively, you need to understand MySQL binlog event types. MySQL replication developers have kindly provided a very helpful description of the MySQL binlog format that is not hard to read. You'll need to refer to it if you get very deeply into binlog analysis. It certainly beats reading the MySQL replication code, which is a bit of a thicket.

Anyway, I hope this script proves useful. As you may have noted from the URL the script is checked into the Tungsten project on SourceForge and will be part of future releases. I plan to keep tweaking it regularly to add features and fix bugs. Incidentally, if you see any bugs let me know. There are without doubt a couple left.

Jan 2, 2010

Exploring SaaS Architectures and Database Clustering

Software-as-a-Service (Saas) is one of the main growth areas in modern database applications. This topic has become a correspondingly important focus for Tungsten, not least of all because new SaaS applications make heavy use of open source databases like MySQL and PostgreSQL that Tungsten supports.

This blog article introduces a series of essays on database architectures for SaaS and how we are adapting Tungsten to enable them more easily. I plan to focus especially on problems of replication and clustering relevant to SaaS—what are the problems, what are the common design patterns to solve them, and how to deploy and operate the solutions. I will also discuss how to make replication and clustering work better for these cases—either using Tungsten features that already exist or features we are designing.

I hope everything you read will be solid, useful stuff. However, I will also discuss problems where we are in effect thinking out loud about on-going design issues, so you may also see some ideas that are half-baked or flat-out wrong. Please do me the kindness of pointing out how they can be improved.

Now let's get started. The most important difference between SaaS applications and ordinary apps is multi-tenancy. SaaS applications are typically designed from the ground up to run multiple tenants (i.e., customers) on shared software and hardware. One popular design pattern is to have users share applications but keep each tenant's data stored in a separate database, spreading the tenant databases over multiple servers as the number of tenants grows.

Multi-tenancy has a number of important impacts on database architecture. I'm going to mention just three, but they are all significant. First of all, multi-tenant databases tend to evolve into complex topologies. Here's a simple example that shows how a successful SaaS application quickly grows from a single, harmless DBMS server to five servers linked by replication with rapid growth beyond.

In the beginning, the application has tenant data stored in separate databases plus an extra database for the list of tenants as well as data shared by every application. In accounting applications, for example, the shared information would include items like currency exchange and VAT rates that are identical for each tenant. Everything fits into a single DBMS server and life is good.

Now business booms and more tenants join, so soon we split the single server into three—a server for the shared data plus two tenant servers. We add replication to move the shared data into tenant databases.

Meanwhile business booms still more. Tenants want to run reports, which have a tendency to hammer the tenant servers. We set up separate analytics servers with optimized hardware and alternative indexing on the schema, plus more replication to load data dynamically from tenant databases.

And this is just the beginning of additional servers as the SaaS adds more customers and invents new services. It is not uncommon for successful SaaS vendors to run 20 or more DBMS servers, especially when you count slave copies maintained for failover and consider that many SaaS vendors also operate multiple sites. At some point in this evolution the topology, including replication as well as management of the databases, is no longer manually maintainable. As we say in baseball, Welcome to the Bigs.

Complex topologies with multiple DBMS servers lead to a second significant SaaS issue: failures. Just having a lot of servers already means failures are a bigger problem than when you run a single DBMS instance. To show why, let's say individual DBMS servers fail in a way that requires you do something about it on average once a year, a number that reliability engineers call Mean Time between Failures (MTBF). Here is a simple table that shows how often we can expect an individual failure to occur. (Supply your own numbers. These are just plausible samples.)

Number of DBMS Hosts			Days Between Failures
1			365
2			182.5
4			91.3
8			45.6
16			22.8
32			11.4

Failures are not just more common with more DBMS hosts, but more difficult to handle. Consider what happens in the example architecture when a tenant data server fails and has to be replaced with a standby copy. The replacement must not only replicate correctly from the shared data server, but the analytic server must also be reconfigured to replicate correctly as well. This is not a simple problem. There's currently no replication product for open source databases that handles failures in these topologies without sooner or later becoming confused and/or leading to extended downtime.

There is a third significant SaaS problem: operations on tenants. This includes provisioning new tenants or moving tenants from one database server to another without requiring extended downtime or application reconfiguration. Backing up and restoring individual tenants is another common problem. The one-database-per-tenant model is popular in part because it makes these operations much easier.

Tenant operations are tractable when you just have a few customers. In the same way that failures become more common with more hosts, tenant operations become more common as tenants multiply. It is therefore critical to automate them as well as make the impact on other tenants as small as possible.

Complex topologies, failures, and tenant operations are just three of the issues that make SaaS database architectures interesting as well as challenging to design and deploy. It is well worth thinking about how we can improve database clustering and replication to handle SaaS. That is exactly what we are working on with Tungsten. I hope you will follow me as we dive more deeply into SaaS problems and solutions over the next few months.

P.s., If you run a SaaS and are interested working with us on these features, please contact me at Continuent. I'm not hard to find.

Dec 26, 2009

Proving Master/Slave Clusters Work and Learning along the Way

2009 has been a big year for Tungsten. In January we had (barely) working replication for MySQL. It had some neat features like global IDs and event filters, but to be frank you needed imagination to see the real value. Since then, Tungsten has grown into a full-blown database clustering solution capable of handling a wide range of user problems. Here are just a few of the features we completed over the course of the year:

Autonomic cluster management using business rules to implement auto-discovery of new databases, failover, and quick recovery from failures
Built-in broadcast monitoring of databases and replicators
Integrated backup and restore operations
Pluggable replication management, proven by clustering implementations based on PostgreSQL Warm Standby and Londiste
Multiple routing mechanisms to provide seamless failover and load balancing of SQL
Last, but not least, simple command line installation to configure and start Tungsten in minutes

You can see the results in our latest release, Tungsten 1.2.1, which comes in both open source and commercial flavors. (See our downloads page to get software as well as documentation.)

In the latter part of 2009 we also worked through our first round of customer deployments, which was an adventure but helped Tungsten grow enormously. Along the way, we confirmed a number of hunches and learned some completely new lessons.

Hardware is changing the database game. In particular, performance improvements are shifting clustering in the direction of loosely coupled master/slave replication rather than tightly coupled multi-master approaches. As I laid out in a previous article, the problem space is shifting from database performance to availability, data protection, and utilization.
Software as a Service (SaaS) is an important driver for replication technology. Not only is the SaaS sector growing, but even small SaaS applications can result in complex database topologies that need parallel, bi-directional, and cross-site replication, among other features. SaaS business economics tend to drive building these systems on open source databases like MySQL and PostgreSQL. By supporting SaaS, you support many other applications as well.
Cluster management is hard but worthwhile. Building distributed management with no single points-of-failure is a challenging problem and probably the place where Tungsten still has the most work to do. Once you get it working, though, it's like magic. We have been focused on trying to make management procedures not just simple but wherever possible to do away with them completely by making the cluster self-managing.
Business rules rock. We picked the DROOLS rule engine to help control Tungsten and make it automatically reconfigure itself when data sources appear or fail. The result has been an incredibly flexible system that is easy to diagnose and extend. Just one example: floating IP address support for master databases took 2 hours to implement using a couple of new rules that work alongside the existing rule set. If you are not familiar with rules technology, there is still time to make a New Year's resolution to learn it in 2010. It's powerful stuff.
Clustering has to be transparent. I mean really transparent. We were in denial on this subject before we started to work closely with ISPs, where you don't have the luxury of asking people to change code. Tungsten Replicator is now close to a drop-in replacement for MySQL replication as result. We also implemented proxying based on packet inspection rather than translation and re-execution to raise throughput and reduce incompatibilities visible to applications.
Ship integrated, easy-to-use solutions. We made the mistake of releasing Tungsten into open source as a set of components that users had to integrate themselves. We have since recanted. As penance we now ship fully integrated clusters with simple installation procedures even for open source editions and are steadily extending the installations to cover not just our own software but also database and network configuration.

Beyond the features and learning experiences the real accomplishment of 2009 was to prove that integrated master/slave clusters can solve a wide range of problems from data protection to HA to performance scaling. In fact, what we have implemented actually works a lot better than I expected when we began to design the system back in 2007. (In case this sounds like a lack of faith, plausible ideas do not not always work in the clustering field.) If you have not tried Tungsten, download it and see if you share my opinion.

Finally, keep watching Tungsten in 2010. We are a long way from running out of ideas for making Tungsten both more capable and easier to use. It's going to be a great year.

Oct 31, 2009

Replicating from MySQL to Drizzle and Beyond

Drizzle is one of the really great pieces of technology to emerge from the MySQL diaspora--a lightweight, scalable, and pluggable database for web applications. I am therefore delighted that Marcus Erikkson has published a patch to Tungsten that allows replication from MySQL to Drizzle. He's also working on implementing Drizzle-to-Drizzle support, which will be very exciting.

Marcus has submitted the patch to us and I have reviewed the code. It's quite supportable, so I plan to integrate it as soon as we are done with our next Tungsten release, which will post around 5 November. You will be able to build and run it using our new community builds.

This brings up a question--what about replicating from MySQL to PostgreSQL? What about other databases? I get the PostgreSQL replication question fairly often but it may be a while before our in-house team can implement plug-in support for it. Anybody want to submit a patch in the meantime? Post in the Tungsten forums if you have ideas and need help to get the work done. Tungsten Replicator code is very modular and it is not hard to add new database support.

Meanwhile, go Marcus!!

Community Builds for Tungsten Clustering

It's been almost two months since I have posted anything on the Scale-Out Blog, as our entire team has been heads-down working on Tungsten. We now have a number of accomplishments that are worth writing articles about. Item one on that list is community builds for Tungsten clusters.

Tungsten community builds offer a bone-simple process to check out and build Tungsten clustering software. The result is a fully integrated package that includes replication, management, monitoring, and SQL routing. The community builds work for MySQL 5.0 and 5.1 and also allow you to set up basic replication from MySQL to Oracle.

Community builds do not include much logic for autonomic management, including automated failover and sophisticated rules that keep databases up and running rain or shine. Those and other features like floating IP address support are part of the commercial Tungsten software. PostgreSQL and Oracle-to-Oracle support is also commercial only at least for the time being.

Community builds do include our standard installation process, which allows you to set up a working cluster a few minutes. You can back up and restore datebases, check liveness of cluster members, failover master databases for maintenance and a lot of other handy features. There is also full documentation, located here.

To get started, you need a host running Mac OS X, Linux, or Solaris that meets the following prerequisites. On Linux you can usually satisfy these requirements using Yum or Apt-get if the required software is not already there.

Java JDK 1.5 or higher.
Ant 1.7.0 or higher for builds
Subversion. We use version 1.6.1
MySQL 5.0 or 5.1 (only on hosts where cluster is installed)
Ruby 1.8.5 or greater (only on hosts where cluster is installed)

Now you can grab the software and do a build. Make a work directory, cd to it, and enter the following commands. (Due to truncation on the blog the SVN URL looks a little funny. Don't be fooled.)

svn checkout \
https://tungsten.svn.sourceforge.net/\
svnroot/tungsten/trunk/community

cd community
./release-community.sh    # (Press ENTER when prompted)

The release-community.sh script checks out most of the Tungsten code for you and does a build. IMPORTANT NOTE: The command shown above builds SVN HEAD, which means you may have a life of adventure. You can also build off branches which are more or less stable. Look at the available config files in the community directory.

After the build finishes, you have ready-to-install clustering software. You can scp the resulting tar.gz file out to another host or just cd directly into the build itself as shown below and run the configure script, which sets up Tungsten software on a single host.

cd build/tungsten-community-2009-1.2
./configure

You may need to read the manuals so you get all the answers right. The installation manual is posted here at www.continuent.com. You'll also need to look at the Replication Guide, Chapter 2 to see how to set up MySQL properly. We'll do that automatically in the future, but for now it's help yourself. (Don't worry: the database set-up is easy.)

To make the cluster interesting you should install on at least a couple of hosts. Here's what an installed cluster looks like using the Tungsten cluster control (cctrl) program.

[tungsten@centos5a tungsten-community-2009-1.2]$ tungsten-manager/bin/cctrl
[LOGICAL] /cluster/comm/> ls

COORDINATOR[centos5a:MANUAL]

ROUTERS:
+-----------------------------------------------------------------------+
|NONE                                                                   |
+-----------------------------------------------------------------------+

DATASOURCES:
+-----------------------------------------------------------------------+
|centos5a(master:ONLINE, progress=3)                                    |
+-----------------------------------------------------------------------+
|  REPLICATOR(role=master, state=ONLINE)                                |
|  DATASERVER(state=ONLINE)                                             |
+-----------------------------------------------------------------------+

+-----------------------------------------------------------------------+
|centos5b(slave:ONLINE, progress=3, latency=0.0)                        |
+-----------------------------------------------------------------------+
|  REPLICATOR(role=slave, master=centos5a, state=ONLINE)                |
|  DATASERVER(state=ONLINE)                                             |
+-----------------------------------------------------------------------+

Starting from scratch and pulling code from SourceForge, it takes me about 30 minutes to get to an installed cluster with two nodes. At this point you have access to a very powerful set of tools to protect data, keep your databases available, and scale performance. Look at the manuals. Try it out. If you have questions or feedback, post them in the Tungsten forums. In the meantime, have fun with your database cluster.

p.s., We will post binary builds next week. The current build is in final release checks, so you may notice a few problems--I hit a Ruby warning on configuration that will be fixed shortly.