Monday, March 7, 2011

Understanding Tungsten Replication Services

If you follow Giuseppe Maxia's Datacharmer blog you have seen several recent articles on Tungsten Replicator.  Giuseppe and I work closely together on replication at Continuent, and I have promised a matching set of articles about replication internals that match the practical introduction provided by Giuseppe.  In this first article I will describe replication services, which are message processing flows that run in the Tungsten Replicator.

Unlike many replication engines, Tungsten Replicator can run multiple replication services concurrently.  There is a central management interface that allows you to start new replication services without disturbing services that are already running.  Each replication service also has its own management interface so that you can put the loaded service online, offline, etc. without disturbing other replication work.  As Tungsten is written in Java, the management interfaces are based on JMX, a standard administrative interface for Java applications.

Here is a simple diagram that shows a Tungsten Replicator with two replication services named fra and nyc that replicate from separate DBMS masters in Frankfurt and NYC into a single slave in San Francisco.   You can immediately see the power of replication services--a single Tungsten Replicator process can simultaneously replicate between several locations.  Replication services are an important building block for the type of complex setups that Giuseppe Maxia discussed in his blog article on Replication Topologies

Users who are handy with Java can write their own programs to manipulate the JMX interfaces directly.  If not, there is the trepctl utility, which is supplied with the Tungsten Replicator and works off the command line. 

If the Tungsten Replicator architecture reminds you of a Java application server, you are absolutely right.  Java VMs have a relatively large resource footprint compared to ordinary C programs, so it is typically more efficient to put multiple applications in a single VM rather than running a lot of individual Java processes.  Tungsten replication services follow the same design pattern, except that instead of serving web pages they replicate database transactions.  

Let's now look a little more deeply at how Tungsten Replicator organizes replication services.  Each replication service runs a single pipeline, which is Tungsten parlance for a configurable message flow.  (For more on pipelines, read here.)  When the service starts, it loads an instance of a Java class called OpenReplicatorManager that handles the state machine for replication (online, offline, etc.) and provides the management interfaces for the services.  The OpenRepicatorManager instance in turn depends on a number of external resources from the file system and DBMS. 

Here is another diagram showing how Tungsten Replicator organizes all of the various parts for services.  Services need a configuration file for the pipeline, as well as various bits of disk space to store transaction logs and replication metadata.  The big challenge is to ensure things do not accidentally collide.

This layout seems a bit complex at first but is reasonably simple once you get used to it.  Let's start with service configuration using our fra service as an example. 

Service configuration files are stored in the tungsten-replicator/conf directory.   There are up to two files for each service.  The file defines all properties of the service, pipeline organization and properties like the replication role or master address that may change during operation.  The contains overrides to selected properties.  For instance, if you switch the replication role from slave to master as part of a failover operation, it goes in the file.  Tungsten Replicator reads the static file first, then applies the overrides when it starts the service.

Next, we have Tungsten transaction logs, also known as the Transaction History Log.  This is a list of all transactions to be replicated along with metadata like global transaction IDs and shard IDs.  THL files for each service are normally stored in the logs directory at the same level as the tungsten release directory itself.  There is a separate directory for each service, as for example logs/fra

Next we have Tungsten relay logs.  These are downloaded binlogs from a MySQL master DBMS from which the replication service creates the Tungsten transaction logs.  Not every replication service uses these.  They are required when the MySQL master is on another host, or the binlogs are on an NFS-mounted file system, which Tungsten does not parse very efficiently yet.  Tungsten relay logs use the same pattern as the THL--everything is stored under relay-logs with a separate subdirectory for each service, for example relay-logs/fra

Finally, there is metadata in the DBMS itself.  Each replication service has a database that it uses to store restart points for replication (table trep_commit_seqno) as well as heartbeats and consistency checks (tables heartbeat and consistency, respectively).   The name of this database is tungsten_servicename as in tungsten_fra

Setting up services is difficult to do manually, so Tungsten Replicator 2.0 has a program named configure-service that defines new replication services and removes old ones by deleting all traces including the database catalogs. You can find out all about installation and starting services by looking the Tungsten Replicator 2.0 Installation and Configuration Guide, which is located here

Services have been part of Tungsten Replicator for a while but we have only recently begun to talk about them more widely as part of the release of Tungsten Replicator 2.0.0 in February 2011, especially as we are start to do more work with multi-master topologies.  One of the comments we get is that replication services make Tungsten seem complicated and therefore harder to use, especially compared with MySQL replication, which is relatively easy to set up.  That's a fair criticism.  Tungsten Replicator is really a very configurable toolkit for replication and does far more than MySQL replication or just about any other open source replicator for that matter.   Like most toolkits, the trade-off for power is complexity.

We are therefore working on automating as much of the configuration as possible, so that you can set up even relatively complex topologies with just a couple of commands.  You'll see more of this as we make additional replicator releases (version 2.0.1 will be out shortly) and push features fully into open source.   Meanwhile, if you have comments on Tungsten 2.0.0 please feel free to post them back to us.


Kaj Magnus said...

Concerning this: "Next we have Tungsten relay logs. These are downloaded binlogs from a MySQL master DBMS from which the replication service creates the Tungsten transaction logs."

I'm unsure about what "from which" refers to? Does Tungsten build Tungsten transaction logs from the relay logs ( = downloaded binlogs)? That is, does Tungsten look at the relay logs, parse them, do something with the parsed data, and the result is the Tungsten transaction logs?

A follow up question: Why cannot the relay logs be used directly? Why are Tungsten transaction logs needed instead?

Kaj Magnus said...

(I forgot to subscribe to follow up comments; I'm attempting to subscribe now, via this comment)

Robert Hodges said...

Hi Kaj, sorry about the ambiguity. Tungsten downloads MySQL binlogs over the network to create local copies or reads directly from binlog files as they are written. Either way, we parse the binary format and turn it into our own log format, which we call the transaction history log (THL).

As to your second question: we cannot use the binary logs directly for several reasons.

(a) They only work for MySQL whereas Tungsten also reads logs other DBMS types like Oracle.

(b) They lack important features like global transaction IDs, clearly demarcated transaction boundaries, column names as well as primary keys, and portable storage formats that translate easily to other DBMS data types.

(c) Binlog formats are tied to particular MySQL versions--for example MySQL 5.6 binlogs are not compatible with 5.5 binlogs.

As you can see there are many good reasons. :)

Kaj Magnus said...

Thank you Robert for the explanation, it was interesting

Moody Youssef said...

Hi Robert,

Thanks for this article, it helped.

I'm having some confusion on how the THL are applied to the slave(s). Does the tungsten manager push these events to the slave(s)? Or do the slaves pull?

I am using MySQL btw and when I check the process list when dumping a file into a master, I am seeing a local connection (ex. slave1:35069). So it seems like tungsten is using an ssh tunnel to then apply through a localhost mysql connection? Just want to get clarification on the differences between tungsten and standard Mysql replication Hope this makes sense.


Robert Hodges said...

Glad you liked the article. Slaves pull from the master. They make a local copy of the THL records then apply that to the replica, generally using a JDBC connection in the case of relational DBMS like MySQL or using CSV in the case of data warehouses like Hadoop.

Check out the replicator docs at for more information or post to the discussion list at if you have more questions.

Scaling Databases Using Commodity Hardware and Shared-Nothing Design