Thursday, January 28, 2010

MariaDB is Thinking about Fixing MySQL Replication and You Can Help

In case you have not noticed, MariaDB is joining the list of projects thinking about how to improve MySQL replication.   The discussion thread starts here on the maria-developers mailing list.

This discussion was jointly started by Monty Program, Codership, and Continuent (my employer) in an effort to push the state of the art beyond features offered by the current MySQL replication.  Now that things are starting to die down with the Oracle acquisition, we can get back to the job of making the MySQL code base substantially better.  The first step in that effort is to get a discussion going to develop our understanding of the replication problems we think are most important and outline a strategy to solve them. 

Speaking as a developer on Tungsten, my current preference would to be to improve the existing MySQL replication.  I suspect this would also be the preference of most current MySQL users.  However, there are also more radical approaches on the table, for example from our friends at Codership, who are developing an innovative form of multi-master replication based on group communications and transaction certification.  That's a good thing, as we want a range of contrasting ideas that take full advantage of the creativity in the community on the topic of replication.

If you have interest in improving MySQL replication please join the MariaDB project and contribute your thoughts.  It should be an interesting conversation.

Wednesday, January 27, 2010

Tungsten 1.2.2 Release is Out - Faster, More Stable, More Fun

Release 1.2.2 of Tungsten Clustering is available on SourceForge as well as through the Continuent website.  The release contains mostly bug fixes in the open source version but there are also two very significant improvements of interest to all users.
  • The manager and monitoring capabilities of Tungsten are completely integrated on the same group communications channel.  This fixes a number of problems that caused data sources not to show up properly in older versions.  
  • We are officially supporting a new Tungsten Connector capability for MySQL called pass-through mode, which allows us to proxy connections by transferring network blocks directly rather than translating native request protocol to JDBC calls.  Our tests show that it speeds up throughput by as much as 200% in some cases. 
The commercial version has additional features like PostgreSQL warm standby clustering, add-on rules to manage master virtual IP addresses and other niceties.   If you are serious about replication and clustering it is worth a look.

This is a good time to give a couple of reminders for Tungsten users.  First, Tungsten is distributed as a single build that integrates replication, management, monitoring, and connectivity.   The old Tungsten Replicator and Myosotis builds are going away.   Second, we have a single set of docs on the Continuent website that covers both open source and commercial distributions.

With that, enjoy the new release.  If you are using the open source edition, please post your experiences in the Tungsten community forums or write a blog article.  We would love to hear from you.

P.s., We have added Drizzle support thanks to a patch from Marcus Eriksson but it's not in 1.2.2.  For that you need to build directly from the SVN trunk.  Drizzle support will be out in binary builds as part of Tungsten version 1.3.

Sunday, January 17, 2010

What's in *Your* Binlog?

Over the last couple of months I have run into a number of replication problems where I needed to run reports on MySQL binlogs to understand what sort of updates servers were processing as well as to compute peak and average throughput.   It seems that not even Maatkit has a simple tool to report on binlog contents, so I wrote a quick Perl script called analyze-binlog.pl to summary output from mysqlbinlog, which is the standard tool to dump binlogs to text. 

The script operates as a filter with the following syntax:
Usage: ./binlog-analyze.pl [-h] [-q] [-v]
Options:
  -h : Print help
  -q : Suppress excess output
  -v : Print verbosely for debugging
To get a report, you just run mysqlbinlog on a binlog file and pipe the results into analyze-binlog.pl.  Here is typical invocation and output.  The -q option keeps the output as short as possible.
$ mysqlbinlog /var/lib/mysql/mysql-bin.001430 | ./binlog-analyze.pl -q
===================================
| SUMMARY INFORMATION             |
===================================
Server Version    : 5.0.89
Binlog Version    : 4
Duration          : 1:03:37 (3817s)

===================================
| SUMMARY STATISTICS              |
===================================
Lines Read        :        17212685
Events            :         3106006
Bytes             :      1073741952
Queries           :         2235077
Xacts             :          817575
Max. Events/Second:            5871.00
Max. Bytes/Second :         1990077.00
Max. Event Bytes  :          524339
Avg. Events/Second:             813.73
Avg. Bytes/Second :          281305.20
Avg. Queries/Sec. :             585.56
Avg. Xacts/Sec.   :             214.19
Max. Events Time  :         9:01:02

===================================
| EVENT COUNTS                    |
===================================
Execute_load_query   :           10
Intvar               :        53160
Query                :      2235077
Rotate               :            1
Start                :            1
User_var             :          182
Xid                  :       817575

===================================
| SQL STATEMENT COUNTS            |
===================================
begin                :       817585
create temp table    :            0
delete               :        31781
insert               :           20
insert into          :       411266
select into          :            0
update               :       633857

There are lots of things to see in the report, so here are a few examples.  For one thing, peak update rates generate 5871 events and close to 2Mb of log output per second.  That's loaded but not enormously so--MySQL replication can easily dump over 10,000 events per second into the binlog using workhorse 4-core machines.  The application(s) connected to the database execute a large number of fast, short transactions--typical of data logging operations, for example storing session data.  We can also see from the Execute_load_query events that somebody executed MySQL LOAD DATA INFILE commands.  That's interesting to me because we are just putting them into Tungsten and need to look out for them in user databases.

To interprete the binlog report most effectively, you need to understand MySQL binlog event types.  MySQL replication developers have kindly provided a very helpful description of the MySQL binlog format that is not hard to read.  You'll need to refer to it if you get very deeply into binlog analysis.  It certainly beats reading the MySQL replication code, which is a bit of a thicket.

Anyway, I hope this script proves useful.  As you may have noted from the URL the script is checked into the Tungsten project on SourceForge and will be part of future releases.  I plan to keep tweaking it regularly to add features and fix bugs.  Incidentally, if you see any bugs let me know.  There are without doubt a couple left.

Saturday, January 2, 2010

Exploring SaaS Architectures and Database Clustering

Software-as-a-Service (Saas) is one of the main growth areas in modern database applications.  This topic has become a correspondingly important focus for Tungsten, not least of all because new SaaS applications make heavy use of open source databases like MySQL and PostgreSQL that Tungsten supports.

This blog article introduces a series of essays on database architectures for SaaS and how we are adapting Tungsten to enable them more easily.  I plan to focus especially on problems of replication and clustering relevant to SaaS—what are the problems, what are the common design patterns to solve them, and how to deploy and operate the solutions. I will also discuss how to make replication and clustering work better for these cases—either using Tungsten features that already exist or features we are designing.

I hope everything you read will be solid, useful stuff. However, I will also discuss problems where we are in effect thinking out loud about on-going design issues, so you may also see some ideas that are half-baked or flat-out wrong.  Please do me the kindness of pointing out how they can be improved.

Now let's get started. The most important difference between SaaS applications and ordinary apps is multi-tenancy. SaaS applications are typically designed from the ground up to run multiple tenants (i.e., customers) on shared software and hardware. One popular design pattern is to have users share applications but keep each tenant's data stored in a separate database, spreading the tenant databases over multiple servers as the number of tenants grows. 

Multi-tenancy has a number of important impacts on database architecture. I'm going to mention just three, but they are all significant. First of all, multi-tenant databases tend to evolve into complex topologies. Here's a simple example that shows how a successful SaaS application quickly grows from a single, harmless DBMS server to five servers linked by replication with rapid growth beyond.  

In the beginning, the application has tenant data stored in separate databases plus an extra database for  the list of tenants as well as data shared by every application. In accounting applications, for example, the shared information would include items like currency exchange and VAT rates that are identical for each tenant. Everything fits into a single DBMS server and life is good. 

Now business booms and more tenants join, so soon we split the single server into three—a server for the shared data plus two tenant servers. We add replication to move the shared data into tenant databases. 

Meanwhile business booms still more. Tenants want to run reports, which have a tendency to hammer the tenant servers. We set up separate analytics servers with optimized hardware and alternative indexing on the schema, plus more replication to load data dynamically from tenant databases.

And this is just the beginning of additional servers as the SaaS adds more customers and invents new services.  It is not uncommon for successful SaaS vendors to run 20 or more DBMS servers, especially when you count slave copies maintained for failover and consider that many SaaS vendors also operate multiple sites.  At some point in this evolution the topology, including replication as well as management of the databases, is no longer manually maintainable.  As we say in baseball, Welcome to the Bigs.

Complex topologies with multiple DBMS servers lead to a second significant SaaS issue: failures. Just having a lot of servers already means failures are a bigger problem than when you run a single DBMS instance. To show why, let's say individual DBMS servers fail in a way that requires you do something about it on average once a year, a number that reliability engineers call Mean Time between Failures (MTBF). Here is a simple table that shows how often we can expect an individual failure to occur.  (Supply your own numbers.   These are just plausible samples.)

Number of DBMS Hosts


Days Between Failures
1


365
2


182.5
4


91.3
8


45.6
16


22.8
32


11.4

Failures are not just more common with more DBMS hosts, but more difficult to handle. Consider what happens in the example architecture when a tenant data server fails and has to be replaced with a standby copy. The replacement must not only replicate correctly from the shared data server, but the analytic server must also be reconfigured to replicate correctly as well. This is not a simple problem. There's currently no replication product for open source databases that handles failures in these topologies without sooner or later becoming confused and/or leading to extended downtime.

There is a third significant SaaS problem:  operations on tenants. This includes provisioning new tenants or moving tenants from one database server to another without requiring extended downtime or application reconfiguration.  Backing up and restoring individual tenants is another common problem. The one-database-per-tenant model is popular in part because it makes these operations much easier.

Tenant operations are tractable when you just have a few customers.  In the same way that failures become more common with more hosts, tenant operations become more common as tenants multiply. It is therefore critical to automate them as well as make the impact on other tenants as small as possible.

Complex topologies, failures, and tenant operations are just three of the issues that make SaaS database architectures interesting as well as challenging to design and deploy.  It is well worth thinking about how we can improve database clustering and replication to handle SaaS.  That is exactly what we are working on with Tungsten.  I hope you will follow me as we dive more deeply into SaaS problems and solutions over the next few months. 

P.s., If you run a SaaS and are interested working with us on these features, please contact me at Continuent.  I'm not hard to find. 

Scaling Databases Using Commodity Hardware and Shared-Nothing Design