Tuesday, April 3, 2012

Solving the Cloud Database Memory Conundrum

Cloud databases have a memory problem.   Continuent has been doing a lot of Amazon deployments lately, and it is becoming apparent that memory utilization in those environments is more than just an inconvenience.  In this article I would like to discuss the memory problem that we see in customer implementations and some new features of Tungsten Enterprise that help alleviate it for MySQL.

The Cloud Memory Problem and Database Arrays

As I discussed in a recent article about prefetch, the amount of RAM allocated to the InnoDB buffer pool is one of the principle determinants of MySQL performance.  The speed difference between using a page in the buffer pool vs. getting it from storage is commonly about 100:1 on RAIDed disk. The other determinant is working set size, i.e., the percentage of pages that need to be memory-resident for adequate performance. Working set size is variable and depends on your query patterns as well as the level of indexing on tables. These two variables set a limit to the amount of data you can manage and still access information quickly. Here is a table that shows the relationship.


Max GB of manageable storage
Buffer Pool Size (GB)
5% resident
10% resident
25% resident
50% resident
15
300
150
60
30
30
600
300
120
60
60
1200
600
240
120
120
2400
1200
480
240

The largest instance currently available from Amazon EC2 is a Quadruple Extra Large, which offers 68GB of RAM.  Let's assume we allocate 85% of that to the buffer pool, which is about 58GB.   We'll be generous and say 60GB so it matches my table. Assuming our application performs reasonably with 10% of pages resident we can then manage a maximum of 600GB of stored data per server.

The fact that EC2 instances are rather small understates the difficulty with RAM in Amazon.  You also need more of it.  Amazon EBS is slower than typical on-premise storage, such as your friendly direct-attached RAID, and has highly variable performance as Baron Schwartz pointed out.  You might therefore be inclined to double the working set ratio to 20%, since I/O is so costly you need a bigger working set to minimize reads from storage.  Quadruple Extra Large instances then max out at 300Gb of managed data.  I'm not forgetting SSDs, which are slowly appearing in cloud environments.  They alter the working set ratios but introduce other issues, for example related to cost and capacity.  It's also not clear how much faster they really are in shared environments like Amazon.

This simple storage math makes it obvious that managing data in the cloud requires a mental shift from fewer but larger DBMS servers to groups of much smaller servers, which we can describe as database arrays. Many small SaaS businesses generate a Terabyte of customer data in a year, hence would provision 4 new 300GB servers annually.  Mobile and market automation apps generate many Terabytes per year.  I recently did a capacity calculation for a customer where we hit 50 servers without even considering replicas to ensure availability.   

Database arrays therefore address memory by partitioning storage.   Many applications already partition into shards for other reasons.  In this case, the main question is how the applications find data easily.  Locating "virtualized databases" is a non-trivial problem, especially when you consider that servers may fail, move, or go offline for maintenance.  The more servers you have, the more likely the special cases become.  This is where Tungsten Enterprise comes in. 

Tungsten Enterprise 1.5 Features for Cloud Database Arrays

Tungsten Enterprise builds clusters by taking a set of off-the-shelf DBMS servers linked by replication and making them look like a single virtualized DBMS server to applications.   We call this server a data service.  The key to the single-DBMS illusion is the Tungsten Connector, a fast proxy that sits between applications and database hosts.   It routes connections into the right type of server, for example a master for writes or a slave for reads.  It also allows you to switch masters without stopping applications or losing data.

Tungsten Enterprise already offers several nice properties for cloud operation.  The Connector is software-only and does not use VIPs, which are not supported in Amazon anyway.  Also, the fact that you can switch master and slaves without stopping applications makes it possible to "escape" slow EBS volumes.  That said, previous Tungsten versions limited connections to a single data service per connector, which made database arrays hard to support properly.

Tungsten Enterprise 1.5 adds two features to help with constructing and managing arrays.  First, our colleague Jeff Mace created a great Ruby-based install program called tpm that can install complex cluster topologies from the command line.  If you have used the tungsten-installer program on Tungsten Replicator, you have an idea what is on the way. (If not Jeff will be talking about installations at the Percona Live MySQL Conference on April 11.)   Efficient command-line installations make it easy to set up and manage a lot of clusters in parallel.

Second, thanks to work by Ed Archibald and Gilles Rayrat, the Connector now supports multi-service connectivity.  Connectors previously kept track of cluster state by connecting to one of a list of managers for the local data service.   We extended this to allow Connectors to track state in multiple data services. We developed the feature to support disaster recovery sites, which are another Tungsten 1.5 feature.  As it turns out, though, multi-service connectivity is extremely useful for database arrays in the cloud.  The following diagram compares single and multi-data service connectivity.


As with a single data service the Connector receives state changes providing location of the data service master and slaves as well as whether each is online or offline.  The Connector uses a simple routing scheme based on logins to route SQL connections to the correct server for whatever it is doing.  Beyond that, Tungsten Enterprise takes care of switching masters and load balancing reads that are necessary to use an array efficiently.  

Setting up Multi-Service Connectivity

Multi-service access is trivial to set up.  This is documented more fully in the forthcoming Tungsten Enterprise 1.5 documentation, but here is the basic idea.

First, each Tungsten Enterprise installation contains a file called dataservices.properties in the cluster-home/conf directory.  The Connector uses this file to locate data service managers in order to get cluster state changes.  To enable multi-service connectivity, just ensure there are entries for the managers of each data service in your array before starting the local Connector service.

cluster1=pr1c1.bigdata.com,pr2c1.bigdata.com,pr3c1.bigdata.com
cluster2=pr1c2.bigdata.com,pr2c2.bigdata.com,pr3c2.bigdata.com
cluster3=pr1c3.bigdata.com,pr2c3.bigdata.com,pr3c3.bigdata.com
...

Second, the Connector uses a file called user.map locate in tungsten-connect/conf to define logins.  Edit this file and add a different login for each cluster to user.map.  Save the file and the connector will pick it up automatically.  

# user password service
cluster1 secret cluster1
cluster2 secret cluster2
cluster3 secret cluster3
...

That's about it.  Applications need to make one change, namely to use a different login for each data service.  For some applications, that is quite simple and can be done in a few minutes.  I found to my surprise that some customer applications even do it already, which is very convenient.  For others, it may require more extensive changes.   The good news is that improvements are on the way.  

What's Next

Tungsten Enterprise 1.5 is a solid step toward enabling large-scale data management using MySQL database arrays.  However, we can do much more.  Obviously, it would be easier if the Connector routed to the right DBMS server using the database name on the connect string or by detecting use commands in SQL.  That's on the way already.

More generally, the whole point of Tungsten is to make off-the-shelf DBMS work better by adding redundancy and optimizing use of resources without forcing users to migrate or rearrange data.  Memory is a key resource in cloud environments and deserves special attention.  Tungsten connectivity looks like a powerful complement to the caching that already occurs within databases.  It's also a natural extension of the work we have been doing to speed up replication.

Looking to the future, we are investigating more intelligent load balancing that sends queries to servers where the pages they need are most likely to be already in the buffer pool.  This type of optimization can double or triple effective buffer pool size in the best cases.  Query caching also offers potential low-hanging fruit.  Finally, management parallelization is critical for array operation.  I foresee a long list of Tungsten improvements that are going to be fun to work on and that go a long way to solving the cloud database memory conundrum.

p.s., Continuent has a number of talks scheduled on Tungsten-related topics at the Percona Live MySQL Conference and the SkySQL MariaDB Solutions Day April 10-13 in Santa Clara.  For more on Tungsten Enterprise and EC2 in particular, check out Ed Archibald's talk on April 11.  Giuseppe Maxia and I will also touch on these issues in our talk at at the SkySQL/MariaDB Solutions Day.

No comments:

Scaling Databases Using Commodity Hardware and Shared-Nothing Design