Tuesday, September 1, 2009

The Future of Database Clustering

Baron Schwartz started a good discussion about MMM use cases that quickly veered into an argument about clustering in general. As Florian Haas put it on his blog, this is not just an issue of DRBD vs. MySQL Replication. Is a database cluster something you cobble together through bits and pieces like MMM? Or is it something integrated that we can really call a cluster? This is the core question that will determine the future of clustering for open source databases.

I have a strong personal interest in this question, because Tungsten clustering, which I designed, is betting that the answer is changing in two fundamental ways. First, the problems that clustering solves are evolving, which will in turn will lead to significant changes in off-the-shelf clusters. Second, for most users the new clusters will be far better than solutions built from a bunch of individual pieces.

To see why, let's start with some history of the people who use open source databases and why they have been interested in clustering over the last decade or so. Open source databases have a wide range of users, but there are a couple of particularly significant groups. Small- to medium-sized business applications like content management systems are a very large segment. Large web facing applications like Facebook or GameSpot are another. Then there are a lot of custom applications that are somewhere in between--too big to fit on a single database dual- or quad-core server but completely satisfied with the processing power of 2 to 4 servers.

For a long time all of these groups of users introduced clusters for two main reasons: ensuring availability and raising performance. Spreading processing across a cluster of smaller commodity machines was a good solution to both requirements and explains the enormous popularity of MySQL Replication as well as many less-than-successful attempts to implement multi-master clustering. However the state of the art has evolved in a big way in the last couple of years.

The reason for change is simple: hardware. Multi-core architectures, cheap DRAM, and flash memory are changing not just the cost of databases but the fundamental assumptions of database computing. Pull out your dog-eared copy of Transaction Processing by Gray and Reuter, and have a look at the 1991 price/performance trade-offs for memory inside the front cover. Then look at any recent graph of DRAM and flash memory prices (like this one). For example, within a couple of years it will be practical to have even relatively large databases on SSDs. Assuming reasonable software support random reads and writes to "disk" will approach main memory speeds. Dirt-cheap disk archives are already spread across the Internet. The old graph of costs down to off-line tape has collapsed.

Moreover, open source databases are also starting to catch up with the hardware. In the MySQL community both MySQL 5.4 and Drizzle are focused on multi-core scaling. PostgreSQL has been working on this problem for years as well. Commercial vendors like Schooner are pushing the boundaries with custom appliances that integrate new hardware better than most users can do it themselves and add substantial database performance improvements to boot.

With better multi-core utilization plus cheap memory and SSDs, the vast majority of users will be able to run applications with adequate performance on a single database host rather than the 2 to 4 nodes of yore. In other words, performance scaling is rapidly becoming a non-issue for a larger and larger group of users. These user don't need infinite performance any more than they need infinite features in a word processing program. What's already there is enough, or will be within the next year or two.

Performance is therefore receding as a motivation for clustering. Meanwhile, here are three needs that will drive database clustering of open source SQL databases over the next few years.
  1. Availability. Keeping databases alive has always been the number one concern for open source database users, even back in the days when hosts and databases were less capable. This is not a guess. I have talked to hundreds of them since early 2006. Moreover most users just don't have the time to cover all the corner cases themselves and want something that just works without a lot of integration and configuration.
  2. Data protection. Losing data is really bad. For most users nirvana is verified, up-to-the-minute copies of data without having to worry a great deal about how it happens. Off-site protection is pretty big too. Talk to any DBA if you don't believe how important this problem is.
  3. Hardware utilization. With the dropping cost of hardware, concerns about up-front hardware investment are becoming somewhat outdated. Operational costs are a different matter. Let's look at power consumption and assume a dual CPU host drawing 250W, which we double to allow for cooling and other overhead. Using recent industrial electricity rates of 13.51 cents per kilowatt/hour in California you get an electric bill of around $600 per year. Electricity is just one part of operational expenses, which add up very quickly. (Thanks to an alert reader for correcting my math in the original post.)
We will continue to see database clusters in the future: in fact lots of them. But off-the-shelf clusters that meet the newer requirements in an efficient and cost-effective way for open source databases are going to look quite different from tightly coupled master/master or shared disk clusters like Postgres-R and RAC. Instead, we will see clusters based for the most part on far more scalable master/slave replication and with features that give them many of the same cluster benefits but cover a wider range of needs. To the extent that other approaches remain viable in the mass market, they will need to cover these needs as well.
  • Simple management and monitoring - The biggest complaint about clustering is that it's complicated. That's a solvable problem or should be once you can work with master/slave methods instead of more complex approaches. You can use group communications to auto-discover and auto-provision databases. You can control failover using simple, configurable policies based on business rules. You can schedule recurring tasks like backups using job management queues. You can have installations that pop up and just work.
  • Fast, flexible replication - Big servers create big update loads and overwhelm single-threaded slaves. We either need parallel database replication or disk-level approaches like the proposed PostgreSQL 8.5 log-streaming/hot standby or DRBD. Synchronous replication is a requirement for many users. Cross-site replication is increasingly common as well. Finally, replication methods will need to be pluggable, because different replication methods have different strengths; replication itself is just one part of the clustering solution, which for the most part is the same regardless of the replication type.
  • Top-to-bottom data protection - Simple backup integration is a good start, but the list of needs is far longer: off-site data storage, automatic data consistency checks, and data repair are on the short list of necessary features. Most clustering and replication frameworks offer little or nothing in this area even though replica provisioning is often closely tied to backups. Yet for many users integrated data protection will be the single biggest benefit of the new clustering approach.
  • Partition management - In the near future most applications will fit on a single database server, but most organizations have multiple applications while ISPs run many thousands of them. There need to be ways to assign specific databases to partitions and then allow applications to locate them transparently. This type of large-scale sharding is the problem that remains when single application databases can run on a single host.
  • Cloud and virtualized operation - In the long run virtualization is the simplest cure for hardware utilization problems--far easier and more transparent than other approaches. A large number of applications now run on virtual machines at ISPs or in cloud environments like Amazon for this reason. To operate in virtual environments, database clusters must be software only, have simple installation, and make very minimal assumptions about resources. Also, they need to support seamless database provisioning to as capacity needs rise and fall, for example adding new VMs or provisioning an existing 4 core VM to a larger 8-core VM with more memory as demand shifts.
  • Transparent application access - Applications need to be able to connect to clusters seamlessly using accustomed APIs and without SQL changes. This is actually easier to do on databases that use simple master/slave or disk block methods rather than more complex clustering implementations. (Case in point: porting existing applications to MySQL Cluster.) Also, the application access needs to be able to handle simple performance-based routing, such as directing reports or backups to a replica database. The performance scaling that most users now need is just not that complicated.
  • Open source - For a variety of reasons closed approaches to clustering are doomed to insignificance in the open source database markets. The base clustering components have to be open source as some of them will depend on extensions of existing open source technology down to the level of storage and database log changes. You also need the feedback loops and distribution that open source provides to create mass-market solutions.
What I have just described is exactly what we are building with Tungsten. Tungsten is aimed at the increasingly large number of applications that can run on a single database. We can help with database performance too, of course, but we recognize that over time other issues will loom larger for most users. The technical properties described above are tractable to implement and we have a number of them already with more on the way in the near future. Master/slave clustering is not just feasible--it works, and works well for a wide range of users.

Still, I don't want anyone to mistake my point. There are many applications for which performance is a very serious problem or whose other needs cannot possibly be met by off-the-shelf software. Facebook and other large sites will continue to use massive, custom-built MySQL clusters as well as non-SQL approaches that push the state of the art for scaling and availability. Analytics and reporting will continue to require ever larger databases with parallel query and automatic partitioning of data as Aster and GreenPlum do. There are specialized applications like Telco provisioning that really do require a tightly coupled cluster and where it's worth the effort to rewrite the application so it works well in such an environment. These are all special cases at the high end of the market.

Mainstream users need something that's a lot simpler and frankly more practical to deliver as an off-the-shelf cluster. Given the choice between combining a number of technologies like MMM, backups of various flavors, cron jobs, Maatkit, etc., a lot of people are just going to choose something that pops up and works. The hardware capability shift and corresponding database improvements are tilting the field to clustering solutions like Tungsten that are practical to implement, cover the real needs of users, and are fully integrated. I'm betting that for a sizable number of users this is the future of database clustering.

p.s., We have had a long summer of work on Tungsten, which is why this blog has not been as active as in some previous months. We are working on getting a full clustering solution out in open source during the week of September 7th. For more information check out full documentation of open source and commercial products here.

5 comments:

Robert Young said...

If you're interested, I've been running a blog recently devoted to SSD and BCNF.

Ants Aasma said...

Just wanted to call you on the power-consumption thing, your calculations are off by an order of magnitude. 100W is 876kWh/year. That will not result in $1000 power costs in 10 months, but rather in 100 months.

I don't argue with the principle, dual processor servers consume something more like 250W, and in hot climates and non-green datacenters you can easily multiply that by two to account for A/C and UPS overhead. That will get you to $1000 in 20 months.

Robert Hodges said...

@Ants,

Thanks for the correction. You are exactly right and I just found where the extra zero came from! I will adjust the post.

Frank Flynn said...

The bigger point on power should not be the cost on your electric bill. Rather most of these machines are in 3rd party data centers where the power available is the gating factor. If you go to most any data center and look around you will typically see racks that are half full. This is because they have already consumed the maximum power available.

My point being the dollars on your electric bill are not nearly the whole story, you must include the cost of your data center space including the wasted space as you increase the power density.

Robert Hodges said...

Excellent point. In fact, that's why utilization analyses needs to take into consideration complete operational costs, as under-availability of power would result in under-utilization of other resources, hence raising costs.

Scaling Databases Using Commodity Hardware and Shared-Nothing Design