Saturday, June 20, 2009

When SANs Go Bad

They sometimes go bad in completely unpredictable ways. Here's a problem I have now seen twice in production situations. A host boots up nicely and mounts file systems from the SAN. At some point a SAN switch (e.g., through a Fibrechannel controller) fails in such a way that the SAN goes away but the file system still appears visible to applications.

This kind of problem is an example of a Byzantine fault where a system does not fail cleanly but instead starts to behave in a completely arbitrary manner. It seems that you can get into a state where the in-memory representation of the file system inodes is intact but the underlying storage is non-responsive. The non-responsive file system in turn can make operating system processes go a little crazy. They continue to operate but show bizarre failures or hang. The result is problems that may not be diagnosed or even detected for hours.

What to do about this type of failure? Here are some ideas.
  1. Be careful what you put on the SAN. Log files and other local data should not go onto the SAN. Use local files with syslog instead. Think about it: your application is sick and trying to write a log message to tell you about it on a non-responsive file system. In fact, if you have a robust scale-out architecture, don't use a SAN at all. Use database replication and/or DRBD instead to protect your data.
  2. Test the SAN configuration carefully, especially failover scenarios. What happens when the host fails from access one path to another? What happens when another host picks up the LUN from a "failed" host? Do you have fencing properly enabled?
  3. Actively look for SAN failures. Write test files to each mounted file system and read them back as part of your regular monitoring. That way you know that the file system is fully "live."
The last idea gets at a core issue with SAN failures--they are rare, so it's not the first thing people think of when there is a problem. The first time this happened on one of my systems it was around 4am in the morning. It took a really long time to figure out what was going on. We didn't exactly feel like geniuses when we finally checked the file system.

SANs are great technology, but there is an increasingly large "literature" of SAN failures on the net, such as this overview from Arjen Lentz and this example of a typical failure. You need to design mission-critical systems with SAN failures in mind. Otherwise you may want to consider avoiding SAN use entirely.

Wednesday, June 17, 2009

Lots of New Tungsten Builds--Get 'Em While They're Hot

There is a raft of new Tungsten open source builds available for your replication and clustering pleasure. Over the last couple of days we uploaded new binary builds for Tungsten Replicator, Tungsten Connector, Tungsten Monitor, and Tungsten SQL Router. These contain the features described in my previous blog article, including even more bug fixes (36 on Tungsten Replicator alone) than I had expected as we had a debugging fest over the last few days that knocked off a bunch of issues. You can pick up the builds on the Tungsten download page. Docs are posted on the Tungsten wiki.

If you have questions, see problems with the builds, or just want to tell us how great they are, please post on the community forums or on the tungsten-discuss mailing list.

Our next open source release will be the Tungsten Manager, which is long overdue to join the family of regular builds. We are doing some polishing work on the state machine processing and group communications, after which the Manager will go out along with documentation on how to use it.

Wednesday, June 10, 2009

Tungsten Development News - Lots of New Features!

Articles on this blog have been pretty scanty of late for a simple reason--we have been 100% heads-down in Tungsten code since the recent MySQL Conference. The result has been a number of excellent improvements that are already in Subversion and will appear as open source builds over the next couple of weeks.

Tungsten has a simple goal: create highly available, performant database clusters using unaltered commodity databases that are simple to manage and look as close to a single database as possible for applications. Over the last two months we completed the integration of individual Tungsten components necessary to make this happen.

Full integration is a big step forward and finally gets us to the ease-of-use we were seeking. Imagine you want to add a slave database to the cluster. There's no management procedure any more--you just turn it on. Managers in the cluster automatically detect the new slave and add it as a data source. That's the way we want every component to work from top to bottom--either on or off, end of story. It was really nice to see it start to work a few weeks ago.

We are now ready to start pushing builds out to the Tungsten project. Here is a selection of the features:

Tungsten Replicator -- API support for seamless failover, certification on Solaris, better Windows support, testing against MariaDB, and many other improvements like flush events for seamless failover. There are already 26 fixes in JIRA and I expect more before we post the build.

Tungsten SQL Router -- Pluggable load balancing with session consistency support. Session consistency means users see their own writes but can read changes by other users from a slave. It works using a single database connection, which is an important step toward eliminating application changes in order to scale on master/slave clusters.

Tungsten Manager -- Directory-based management model that allows you to view and manage both JMX-enabled services as well as regular operating system processes that follow the familiar LSB pattern of 'service name start/stop/restart'. The managers use group communications and can broadcast commands across multiple hosts, handle failures, and automatically detect new services as they come online.

Tungsten Monitor -- Improved monitoring of replicator status including slave latency, which is necessary to guide SQL Router load balancing features like session consistency.

There's a lot going on with Tungsten right now, in fact far too many things to mention even in a longish post like this one. One of my current code projects is to implement built-in backup and restore for Tungsten Replicator. I am planning on supporting slave auto-provisioning: a new slave comes up, restores the latest backup, and starts replicating. All you have to do is turn the slave on. (More of that on/off stuff--it's kind of an obsession for us at this point.)

Integrating backup/restore is the final big feature for Tungsten Replicator 1.0--after this we plan to turn attention to parallel replication and are already discussing how this might work with several potential customers. Feel free to contact me through this blog or better yet post on the community forums parallel replication topic to join the conversation.

One final bit of news, we are starting to work seriously on Tungsten PostgreSQL integration thanks to a new partnership between Continuent and 2nd Quadrant. This work is commercially focused for now but will lead to additional open source features in the not too distant future. Keep watching this space... :)

p.s., We also had a nice refit on the community website. Check it out.

Scaling Databases Using Commodity Hardware and Shared-Nothing Design