Thursday, May 15, 2008

Tungsten Scale-Out Stack Presentation from MySQL Conference

There have been a number of requests for copies of the slides to the Tungsten Scale-Out Stack talk I gave at the MySQL Conference in April. Here they are courtesy of the nice folks at O'Reilly who organized the conference.

Tungsten is our codename for a set of technologies to raise database performance and availability using scale-out. In the database world scale-out is a term of art that means spreading data across servers on multiple systems. With data in multiple places you are less subject to failures--when one copy crashes you just use the others. Similarly, if your application runs a lot of queries, you can spread them over different machines, which makes for faster and more stable response times.

So database scale-out sounds great (and is too), but getting it to work properly is harder than you would think. Along with practical issues like management, there are theoretical barriers. Let's say you are creating a product catalog service using database replicas on different hosts. Applications connect to any replica to get information. Your manager, a guy with pointed hair, tells you to make sure of the following:

1. The catalog service is always available.
2. The service keeps working even if you get a network partition between hosts.
3. The copies are always consistent (e.g., you can go to any copy and get the same data).

Here's an ugly surprise. It turns out your data service can only have two of the three properties at any given time, a result that was proven only recently and is now called the CAP Principle. If you want to be available and handle network partitions, you must accept that data will sometimes be inconsistent. Your manager is going to be very disappointed.

That's where we get back to Tungsten and the Scale-Out Stack. We realized a while back that you can't think in terms of a single product or even family of products to solve scale-out in a general way. It's better to design a flexible set of technologies with different strengths and weaknesses that users choose based on what's important to them. If you need to cluster over a WAN, use master/slave replication. If you don't want master failures, use synchronous replication in middleware.

Read the slides to learn more about the thinking. Database scale-out is a fascinating problem and we are looking forward to making it much easier to handle. Please stay tuned! I'll be writing more about this in the weeks and months to come.

Wednesday, May 7, 2008

What Else *Would* Oracle Say?

This just in. In a long interview on Linux Voices, Oracle's Linux architect Edward Screven comments on the MySQL/Sun acquisition.

...we just don’t care. I mean, we don’t see MySQL very often, again, in competitive deals. It’s out there, but it’s not very often that a database sales rep comes back and says, “I had to compete for the business against MySQL.”

To be fair the question is about how the MySQL acquisition affects Linux. But it seems really hard to believe Oracle does not care about MySQL. This is the same company that bought InnoDB. There is no doubt that Oracle is watching developments at Sun very carefully. The interesting problem for Oracle is not simply that Sun now has MySQL. It is that Sun owns or backs a portfolio of open source databases. And there are plenty of companies besides Sun that are working to make those databases full-featured, highly available, and very scalable. Like my company, Continuent, for example.

With a small number of additional acquisitions, Sun could control the open source end of the market fully. Now that has to be a little disquieting.

Sunday, May 4, 2008

MySQL, Sun, and the Future of Open Source Databases

So what's it like now that Sun now owns MySQL? The executive summary: a little weird. I was at the MySQL User Conference a couple of weeks ago and had a chance to talk with a lot of people in the community as well as many MySQL folks. Marten Mickos is now the head of database products at Sun. It's not very hard to figure out what Sun will do with MySQL products for the near future--pretty much what MySQL was doing already.

The real question for a lot of people is what will happen with databases like PostgreSQL and Derby. Sun has invested heavily in both of them, and PostgreSQL in particular is now quite fast. With the MySQL acquisition, Sun has an opportunity to run the table with multiple offerings that cover both enterprise applications as well as web and embedded. However, that would mean cutting down the MySQL roadmap to concentrate, for example, on scale-out rather than scale-up. It would also require thinking big to combine with other vendors in order to disrupt the market leader Oracle. Done right, there's a chance to upend the industry in a way that has not occurred since Microsoft muscled into databases in the early 1990s using code bought from Sybase.

Based on talks from people like Rich Green and Marten Mickos, it's hard to see this happening. Sun is taking a hands-off approach to MySQL *and* giving MySQL management control of overall database strategy. A disruptive change therefore seems unlikely. In fact, the more likely result is stagnation, now that MySQL no longer has to fight for its existence. The MySQL roadmap is still pretty diffuse and there has been little product movement since the 2007 User Conference. MySQL 5.1 is still not out the door. Falcon is likely to show up ready for production use around the time the Boeing Dreamliner rolls out. MySQL is still working on multiple storage engines (2 new ones plus NDB and MyISAM, to name a couple.) There's not even a glimmer of a date for cool new replication features like a pluggable replication interface. In short, not much evidence for radical changes of any kind.

Also, there must be the awful temptation to focus on vertical scaling so that MySQL can work on Sun hardware with large numbers of cores. I asked Marten Mickos specifically about the choice between scaling up and scaling out but didn't get a very clear answer. Personally I think for MySQL to concentrate very hard on vertical scaling would be a strategic error. The community that made MySQL great is into commodity hardware and scale-out in a big way. First rate support for highly scaled SMP architectures is going to be a long slog that will compromise delivery of many other features.

Given all of this it's hard not to see innovation, particularly in problems like scale-out, shifting away from MySQL to other databases as well as middleware. This would be a great time for the PostgreSQL community to get really serious about data replication. MySQL won't fade--it's already a great database. But there's likely to be a crowd of people in the MySQL community eying other solutions. It's going to be interesting to see what they come up with.

Scaling Databases Using Commodity Hardware and Shared-Nothing Design