Wednesday, November 16, 2011

Why So Many Proprietary Rewrites of MySQL and InnoDB?

Every couple of weeks or so I get marketing email from a Continuent competitor advertising a closed-source clone of MySQL. It is said to be pumped up on illegal substances and therefore the solution to all my problems.  I like this sort of spam because it makes it easier to track what the neighbors are up to.  However it does bring up a question.  Why are so many companies offering what amount to proprietary replacements of MySQL?  This does not mean alternative builds like Percona or MariaDB.  It means products like ClustrixSchooner, or Xeround, which replace MySQL entirely, or like ScaleDB, or Tokutek, which replace InnoDB.  

There's nothing wrong with proprietary software, of course.  And there is nothing wrong with rewriting things to make them better.  The rewrites are a tribute to the vitality of the MySQL marketplace and in some cases quite clever as well.  However, the proprietary offerings tend to obscure an important truth about MySQL.  Most businesses that run on open source software have problems with MySQL management, not with MySQL itself.

Here is a simple example.  Say you have 2 Terabytes in MySQL 5.1.  How do you upgrade from MySQL 5.1 to 5.5 without incurring an application outage?  This is a big problem for 24x7 web-facing applications.  You don't need to rewrite MySQL to do zero-downtime upgrades.  MySQL with InnoDB already works fine.  You just need a way to shift connections transparently to a new master database, upgrade the old master, and shift back when you are done.  Similar reasoning applies for slave provisioning, automated failover, spreading load over replicas to improve performance, or operating across multiple sites.

At Continuent we concluded a number of years ago that you don't need to change MySQL to manage data effectively.  We therefore designed Tungsten Enterprise, Continuent's commercial clustering solution, to work with unaltered MySQL. Tungsten Enterprise uses master/slave replication (i.e., my favorite replicator), distributed management, and transparent connectivity to make a set of standard MySQL or PostgreSQL servers look like a single highly available DBMS that distributes load across all replicas.   This architecture has tremendous advantages, because it complements the strengths of MySQL itself.   Here are a few of the principle benefits.

Transparency.  Everything runs on standard MySQL from initial development to large-scale deployment.  Application code runs the same way on a dev laptop or production.  Application bugs in production are reproducible on the laptop.  Standard MySQL configuration and tuning also work, because this is standard MySQL.

No lock-in.  Don't like Tungsten Enterprise?  Use something else or revert back to simple MySQL.  There's no need to change your database or migrate data.

Data integrity.  InnoDB has had years to shake out bugs, especially those involving data corruption.  There are still a few but they do not typically show up unless there is a bad hardware failure or you configure your system incorrectly.  (Hint #1: don't use MyISAM.)  Do you really want to give this up for a new store implementation?

Scalability.  MySQL performance is improving constantly, driven by competition between builds, an active community, investment from Oracle and large web properties like Facebook.  SSDs are also increasingly affordable and make a lot of performance problems evaporate.  As MySQL improves in this and other areas, you get the benefits.  The trick is to have a way to upgrade.  I mentioned the MySQL 5.1 to 5.5 upgrade problem for precisely this reason.

Operational costs.  There is a deep pool of administrators and administrative tools for MySQL.  Thanks to books like High Performance MySQL, abundant talks, and a wealth of community resources as well as consulting, there is little mystery about how things work.  I probably don't even need to discuss license costs.

Viability.  MySQL is not going anywhere.  Oracle is continuing to invest in the core database, and Percona, MariaDB and most important Microsoft will ensure Oracle stays on its toes.  At Continuent we do our best to keep our friends at Oracle competitive on replication.  Innovation on open source MySQL will continue for years to come.  (Psst, MySQL guys at Oracle are welcome to come work for us. :)

Given the number of advantages that off-the-shelf MySQL confers, the real question is why our approach is not more popular.  Actually it is.  For all the marketing attention generated by proprietary MySQL or InnoDB rewrites, many hundreds of billions of transactions per day run on unaltered MySQL.  Switching to proprietary versions of MySQL is a substantial wrench for most businesses, because the economics run so strongly in favor of open source DBMS.   However, the open source tools for managing MySQL are by-and-large inadequate, in part because some of the problems turn out to be rather difficult to solve.   

When we started to rethink database clustering at Continuent back in 2007, we therefore focused on solving the problems outside MySQL that make data management hard.  That includes building fast replication with global transaction IDs, so you can fail over easily to up-to-date live replicas.  It includes building distributed, rule-based management that has simple primitives like "recover" to fix a broken slave.  It includes speedy, transparent connectivity that can spread reads intelligently across multiple servers and reroute connections transparently to allow maintenance without halting applications.  Finally, it includes simplifying management so that users don't spend much time worrying about their data.  These capabilities are now very robust and help customers handle hundreds of millions of transactions per day.  

It is obvious off-the-shelf MySQL (and PostgreSQL too) are already very good and continuing to get better.  For most users there is no need to migrate to proprietary offerings that give up the leverage conferred by open source databases.  Tungsten Enterprise solves the difficult problems that are critical to building businesses on standard MySQL.   If you are building new systems based on MySQL or scaling old ones you should look hard at what we have done.   

5 comments:

Sheeri K. Cabral said...

Tokutek doesn't replace InnoDB (and Percona does, by the way, with XtraDB).

TokuDB is a storage engine, and there is an open-source patched version of MySQL needed to use the storage engine, but you can also use InnoDB on the same server you use TokuDB on.

Robert Hodges said...

The point, though, is that you put your data into Tokutek instead of InnoDB.

Tokutek incidentally is an example where a special engine seems appropriate, as it greatly improves performance of large tables. If you have that problem it's definitely interesting to look at. They had a good webinar earlier this week, though I don't recall mention of the patch. Thanks for pointing that out.

Sheeri K. Cabral said...

*nod* I agree that Tokutek is a special case, but there really isn't a way to avoid having a separate storage engine that *can* handle hundreds of millions of rows with transactional functionality.

So yes, in order to use TokuDB, you have to insert data into the storage engine.....but you have to do that with Percona also, and XtraDB. (That hasn't stopped us from using the Percona patchset for almost all our clients).

Robert Hodges said...

I don't regard XtraDB as different from InnoDB. It's a patched version that is a drop-in replacement. If you don't like it you can restart normal MySQL. That's part of the reason Percona is so popular. It would be a big step to break that, and I hope it does not happen anytime soon.

Once you get back to open source builds there's a completely different dynamic for innovation. We learned this the hard way at Continuent through products like m/cluster and therefore took a different approach with Tungsten.

Alex said...

"You just need a way to shift connections transparently to a new master database, upgrade the old master, and shift back when you are done."

Really now. You could just shift connections to the new database and be done with it.

Scaling Databases Using Commodity Hardware and Shared-Nothing Design