Saturday, October 31, 2009

Replicating from MySQL to Drizzle and Beyond

Drizzle is one of the really great pieces of technology to emerge from the MySQL diaspora--a lightweight, scalable, and pluggable database for web applications. I am therefore delighted that Marcus Erikkson has published a patch to Tungsten that allows replication from MySQL to Drizzle. He's also working on implementing Drizzle-to-Drizzle support, which will be very exciting.

Marcus has submitted the patch to us and I have reviewed the code. It's quite supportable, so I plan to integrate it as soon as we are done with our next Tungsten release, which will post around 5 November. You will be able to build and run it using our new community builds.

This brings up a question--what about replicating from MySQL to PostgreSQL? What about other databases? I get the PostgreSQL replication question fairly often but it may be a while before our in-house team can implement plug-in support for it. Anybody want to submit a patch in the meantime? Post in the Tungsten forums if you have ideas and need help to get the work done. Tungsten Replicator code is very modular and it is not hard to add new database support.

Meanwhile, go Marcus!!

Community Builds for Tungsten Clustering

It's been almost two months since I have posted anything on the Scale-Out Blog, as our entire team has been heads-down working on Tungsten. We now have a number of accomplishments that are worth writing articles about. Item one on that list is community builds for Tungsten clusters.

Tungsten community builds offer a bone-simple process to check out and build Tungsten clustering software. The result is a fully integrated package that includes replication, management, monitoring, and SQL routing. The community builds work for MySQL 5.0 and 5.1 and also allow you to set up basic replication from MySQL to Oracle.

Community builds do not include much logic for autonomic management, including automated failover and sophisticated rules that keep databases up and running rain or shine. Those and other features like floating IP address support are part of the commercial Tungsten software. PostgreSQL and Oracle-to-Oracle support is also commercial only at least for the time being.

Community builds do include our standard installation process, which allows you to set up a working cluster a few minutes. You can back up and restore datebases, check liveness of cluster members, failover master databases for maintenance and a lot of other handy features. There is also full documentation, located here.

To get started, you need a host running Mac OS X, Linux, or Solaris that meets the following prerequisites. On Linux you can usually satisfy these requirements using Yum or Apt-get if the required software is not already there.
  • Java JDK 1.5 or higher.
  • Ant 1.7.0 or higher for builds
  • Subversion. We use version 1.6.1
  • MySQL 5.0 or 5.1 (only on hosts where cluster is installed)
  • Ruby 1.8.5 or greater (only on hosts where cluster is installed)
Now you can grab the software and do a build. Make a work directory, cd to it, and enter the following commands. (Due to truncation on the blog the SVN URL looks a little funny. Don't be fooled.)
svn checkout \
https://tungsten.svn.sourceforge.net/\
svnroot/tungsten/trunk/community

cd community
./release-community.sh # (Press ENTER when prompted)
The release-community.sh script checks out most of the Tungsten code for you and does a build. IMPORTANT NOTE: The command shown above builds SVN HEAD, which means you may have a life of adventure. You can also build off branches which are more or less stable. Look at the available config files in the community directory.

After the build finishes, you have ready-to-install clustering software. You can scp the resulting tar.gz file out to another host or just cd directly into the build itself as shown below and run the configure script, which sets up Tungsten software on a single host.
cd build/tungsten-community-2009-1.2
./configure
You may need to read the manuals so you get all the answers right. The installation manual is posted here at www.continuent.com. You'll also need to look at the Replication Guide, Chapter 2 to see how to set up MySQL properly. We'll do that automatically in the future, but for now it's help yourself. (Don't worry: the database set-up is easy.)

To make the cluster interesting you should install on at least a couple of hosts. Here's what an installed cluster looks like using the Tungsten cluster control (cctrl) program.
[tungsten@centos5a tungsten-community-2009-1.2]$ tungsten-manager/bin/cctrl
[LOGICAL] /cluster/comm/> ls

COORDINATOR[centos5a:MANUAL]

ROUTERS:
+-----------------------------------------------------------------------+
|NONE |
+-----------------------------------------------------------------------+

DATASOURCES:
+-----------------------------------------------------------------------+
|centos5a(master:ONLINE, progress=3) |
+-----------------------------------------------------------------------+
| REPLICATOR(role=master, state=ONLINE) |
| DATASERVER(state=ONLINE) |
+-----------------------------------------------------------------------+

+-----------------------------------------------------------------------+
|centos5b(slave:ONLINE, progress=3, latency=0.0) |
+-----------------------------------------------------------------------+
| REPLICATOR(role=slave, master=centos5a, state=ONLINE) |
| DATASERVER(state=ONLINE) |
+-----------------------------------------------------------------------+
Starting from scratch and pulling code from SourceForge, it takes me about 30 minutes to get to an installed cluster with two nodes. At this point you have access to a very powerful set of tools to protect data, keep your databases available, and scale performance. Look at the manuals. Try it out. If you have questions or feedback, post them in the Tungsten forums. In the meantime, have fun with your database cluster.

p.s., We will post binary builds next week. The current build is in final release checks, so you may notice a few problems--I hit a Ruby warning on configuration that will be fixed shortly.

Scaling Databases Using Commodity Hardware and Shared-Nothing Design