Friday, April 23, 2010

MySQL Conference Slides and Thoughts on State of the Dolphin

I did two talks on replication and clustering at the recent MySQL Conference in Santa Clara.  Thanks to all of you who attended as well as the fine O'Reilly folks who organized everything.  Slides are posted on the talk descriptions at the following URLs: 

Conferences like the MySQL UC are fun because you get to see all your virtual pals in the flesh and have a beer with them.  This is one of the fundamental open source bonding experiences.  Unfortunately the taps for draft beer stopped working at the bar, and Tungsten is in the middle of a big crunch to get parallel replication working.  I didn't get to hang around a lot this year.  A few things still stood out compared to 2009. 

First of all, long-term effects of the Oracle acquisition are clear.   Edward Screven's keynote on "The State of the Dolphin" was sensible and boring.  It seemed a telling metaphor for life in the community going forward.  Oracle is going to do an adequate job of MySQL engineering and better than adequate for Windows.  This is of course "adequate" in the same way that the word applies to products like Microsoft Word. 

An adequate MySQL is probably the worst possible outcome for the groups trying to build businesses on alternative builds.  It looks like an effective way for Oracle to neutralize competitive threats from below for a few years to come.   On the other hand, it's good for most users, who won't be greatly inclined to switch unless Oracle tries to soak them for big licensing fees.  At least one conference attendee, a licensee of other Oracle products, mentioned that had already happened.  He's a MariaDB fan now. 

Second, solid state devices (SSDs) are for real.   Andreas Bechtolsheim gave a great talk on the coming SSD revolution at the 2009 MySQL Conference.  It sounded good.  At the 2010 conference we started to see some real test results.  The hype on SSDs is completely justified. 

There was an excellent panel talk sponsored by Fusion-IO that presented some very compelling results including 10x throughput improvements that allowed one of the companies doing the testing to pull out and repurpose 75% of their hosts.  PCI-based Fusion-IO cards have a 300- to 400X price differential compared to basic rotating disk, but the cost is likely to drop pretty quickly as the technology matures and more competitors enter the field.  Much cheaper SATA alternatives like the Intel X-25 are already starting to flood the low-end market.  Anybody building database systems has to have a plan that accounts for SSDs now

Third, innovation is continuing apace but the problems (and solutions) are moving away from MySQL.   Mark Callaghan really put his finger on it at his Ignite MySQL talk when he said, "In 3 years MySQL won't be the default DBMS for high-scale applications."  New system investment is going into applications that handle big data, have to utilize new hardware efficiently to operate economically, and require multi-tenancy.  These are good targets for Drizzle, PBXT, Tungsten, and other new projects working to make names for themselves.   We all have to raise our game or MySQL will start to become irrelevant.   It's going to be an interesting year.  :)

Tuesday, April 20, 2010

Customized Data Movement with Tungsten Replicator Pipelines

Have you ever run into a problem where MySQL replication did 95% of what you needed but not the remaining 5% to solve a real problem?  Hacking the binlog is always a possibility, but it typically looks like this example.  Not a pretty sight.  Wouldn't it be easier if replication were a bunch of building blocks you could recombine to create custom replicator processes? 

Tungsten 1.3 has a new feature called pipelines that allows you to do exactly that.  A pipeline consists of one or more stages that tie together generic components to extract, filter, store, and apply events, which is Tungsten parlance for transactions.  Each stage has a processing thread, so multi-stage pipelines can process data independently and without blocking.  The stages also take care of important but tedious issues like remembering the transactional state of each stage so Tungsten can restart without forgetting events or applying them twice.

Here is a picture of how a pipeline is put together.

When Tungsten Replicator starts it loads a pipeline corresponding to its role, for example master or slave.   The preceding picture shows a slave pipeline consisting of two stages.  The first stage pulls replicated events over the network from a master Tungsten Replicator and stores them in a local transaction history log, which we call the THL.  The second stage extracts the stored events and applies them to the database.   This pipeline is analogous to the I/O and SQL threads on a MySQL slave.

Where Tungsten departs from MySQL and most other replicators in a big way is that pipelines, hence the replication flows, are completely configurable.   The configuration is stored in file  Here are the property settings to create the slave pipeline.  Note how the role is the name of a pipeline.  This determines which pipeline to run when the replicator goes online.

# Replicator role. 

# Generic pipelines. replicator.pipelines=master,slave,direct 

# Slave pipeline has two stages:  extract from remote THL to local THL; 
# extract from local THL and apply to DBMS. 



The syntax is not beautiful but it is quite flexible.  Here is what this definition means.
  1. This replicator knows about three pipelines named master, slave, and direct
  2. The slave pipeline has two stages called remote-to-thl and thl-to-dbms and a store called thl.  It has a property named syncTHLWithExtractor which must be set to false for slaves.  (We need to change that name to something like 'isMaster'.) 
  3. The remote-to-thl stage extracts from thl-remote.  This extractor reads events over the network from a remote replicator.  The stage apples to thl-local, which is an applier that writes events to the local transaction history log. 
  4. The thl-to-dbms stage pulls events from the local log and applies them to the database.  Note that in addition to an applier and extractor, there is also a filter named mysqlsessions.  This filter looks at events and modifies them to generate a pseudo-session ID, which is necessary to avoid problems with temporary tables when applying transactions from multiple sessions.  It is just one of a number of filters that Tungsten provides.
Components like appliers, filters, extractors, and stores have individual configuration elsewhere in the file.  Here's an example of configuration for a MySQL binlog extractor.  (Note that Tungsten 1.3 can now read binlogs directly as files or relay them from a master server.) 

# MySQL binlog extractor properties.  

# When using relay logs we download from the master into binlog_dir.  This 
# is used for off-board replication. 

The thing that makes pipelines really flexible is that the interfaces are completely symmetric.  Components to extract events from MySQL binlog or from a transaction history log have identical APIs.  Similarly, the APIs to apply events are the same whether storing events in a log or applying to a slave.  Pipelines can tie together practically any sequence of extract, filter, and apply operations you can think of. 

Here are diagrams of a couple of useful single-stage pipelines. 

The "dummy" pipeline reads events directly from MySQL binlogs and just throws them away.  This sounds useless but in fact it is rather convenient.  You can use the dummy pipeline check whether your binlogs are good.  If you add filters you can also use a dummy pipeline to report on what is in the binlog.  Finally, you can use it as a quick and non-intrusive check to see if Tungsten can handle the data in your binlog--a nice way to ensure you can migrate smoothly. 

Here's the dummy pipeline definition:

# Generic pipelines. 
replicator.pipelines=master,slave,direct, dummy
# Dummy pipeline has single stage that writes from binlog to bit-bucket. 


The "direct" pipeline fetches events directly from a master MySQL server using client log requests over the network and applies them immediately to a slave.  I use this pipeline to test master-to-slave performance, but it's also very handy for transferring a set of SQL updates from the binlog of any master to any slave on the network.  For instance, you can transfer upgrade commands very efficiently out of the binlog of a successfully upgraded MySQL server to other servers on the network.  You can also use it to "rescue" transactions that are stuck in the binlog of a failed master.  That is starting to be genuinely useful. 

The definition of the direct pipeline is already in the default template that comes with Tungsten 1.3, so it is not necessary to repeat it here.  You can just download the software (open source version is here) and have a look at it yourself.  There's almost more documentation than people can bear--look here to find a full set.  Version 1.3 docs will be posted shortly on the website and are already available for commercial customers.   As usual you can also view the source code on 

Pipelines belong to a set of major feature improvements to Tungsten to support SaaS and large enterprise deployments.  Some of the other features include fast event logging directly to disk (no more posting events in InnoDB), low-latency WAN transfer, multi-master replication support, and parallel replication.  Stay tuned!

Scaling Databases Using Commodity Hardware and Shared-Nothing Design