The Scale-Out Blog: Data Fabric Design Patterns: Transactional Data Service

This article is the second in a series on data fabric design and introduces the transactional data service design pattern. The previous article in this series introduced data fabrics, which are collections of off-the-shelf DBMS servers that applications can connect to like a single server. They are implemented from data fabric design patterns, which are reusable arrangements of DBMS servers, replication, and connectivity. With this article we begin to look at individual design patterns in detail.

Description and Responsibilities

The transactional data service is a basic building block of data fabric architectures. A transactional data service is a DBMS server that processes transactions submitted by applications and stores data safely. Transactional data services have the following responsibilities:

Store data transactionally and recover data faithfully to the last full transaction following failure.
Provide a network-accessible application interface for accessing data
Provide a reliable and reasonably quick method for backup and restore.
Maintain an accessible, serialized log of transactions. This enables replication between services.

The following diagram illustrates the moving parts of a transactional data service. In future diagrams we will just use the standard database symbol for the entire transactional data service, but for now we need to be able to see the contents.

Motivation

Durable storage of transactions is the most fundamental responsibility of database systems. It is difficult to build reliable applications if stored data can disappear or become corrupted because transactions were not committed before a crash. Both problems can cause data loss. Moreover, they can break replication links very badly if the DBMS server comes up in an inconsistent state, for example with some updates committed but others randomly rolled back. This condition affects not only the one server but potentially many others throughout the fabric.

The transactional data service therefore focuses on storing data safely and recovering to the last committed transaction after a restart. With this basic capability we can construct more complex services knowing that individual changes are unlikely to disappear or be recorded inconsistently.

Detailed Behavior

Let's look in detail at the properties required for a successful transactional data service. Somewhat surprisingly, an off-the-shelf SQL DBMS does not necessarily fit the pattern, though it comes close. It is important to understand the differences.

The transactional store keeps data from getting lost and is the basis for recovery throughout the fabric. Transactional stores support commit and rollback with multi-statement transactions. In theory the transaction data service responsibility for data persistence matches MySQL/InnoDB and PostgreSQL behavior, both of which commit transactions safely in serial order. However, the reality is not quite that simple.

Most DBMS allow applications to ignore transactions under certain conditions. This results in wormholes, which are violations in serial ordering of data. There are a number of table definition options in SQL that undo transactional consistency.

(MySQL) MyISAM table type. MyISAM tables ignore transactions and commit immediately, even if the application later tries to roll back. The tables may also become corrupted by server crashes.
(MySQL) Memory table type. These tables are maintained in memory only. They disappear on restart.
(PostgreSQL) UNLOGGED tables. Such tables are not logged and disappear on crash or unclean shutdown (thanks Frederico).

All of these allow data to disappear or become corrupted after a crash. However, there is a more subtle problem. If applications depend on these tables, transaction results may then depend on when the server last crashed or restarted, which in turn makes updates across replicas non-deterministic. Random updates create problems for data replication, which depends on replicas behaving identically when transactions are applied. It is important to avoid application dependencies on any feature that creates wormholes or you might not be able to use other design patterns.

So is data loss always bad? Surprisingly, no. In some cases transactional stores can lose data provided that they do so by dropping the last transactions in serial order. It's as if the data just reverted to an earlier point in time. To understand why this might be OK, imagine three servers linked into a chain by asynchronous replication.

Data loss is sometimes not a big deal

It is bad to lose data on the first server, especially if those data are lost before replicating transactions to the downstream replicas. However, data loss on the last server is fine. Assuming that server stores the replication restart point transactionally, it will just re-apply the missing transactions and catch up. This is exactly what happens when you restore a backup in slave in master/slave replication.

Data loss on the second server also may not be a problem. It should restart replication and should generate identical transactions for itself as well as for replication to the last server. In both cases we assume that replication will handle these cases correctly and can replay missing transactions from logs. If so, you can not only tolerate such losses but even depend on recovering from them automatically.

Turning to the next responsibility of the transactional data service, the application interface may obviously include SQL via MySQL or PostgreSQL wire protocols. However, any consistent interface that is accessible over a network will do. The memcached protocol is also perfectly acceptable. Subsets of SQL such as stored procedures work quite well. Transactional data services are more general than SQL DBMS servers in this sense. Full SQL or even a subset is not a requirement.

Backup and restore are critical for data fabrics as they enable provisioning of new services as well as recovery of services that fail. You restore a backup and then let the service catch up using replication. Data fabrics can get along fine using a range of options from logical dumps of the DBMS (mysqldump or pgdump) to file system snapshots. Fabric backups just need to be transactionally consistent and movable to other hosts.

Note that the backup required by the transaction data service design pattern is a less general form of backup than most businesses really require. Businesses may need to recover data after accidental deletion or to keep copies of information for many years for legal reasons. You can therefore use a general-purpose backup solution like Zmanda or Barman provided it meets the fabric design pattern requirements. There's no need to do things twice.

Finally, the replication log is a serialized list of transactions to replicate to other hosts. Serialization enables the transactions to be replayed on another host and result in an identical copy. Generally speaking, data fabrics require logical replication, which applies changes to replicas using SQL statements on a live server. This is because other design patterns depend on being able to access and even write to the transactional data service when it is acting as a slave. Binary replication methods like disk block replication, such as DRBD, do not meet this requirement and therefore are of limited use in data fabrics.

Implementation

You can implement the transactional data service design pattern with any DBMS that meets the pattern responsibilities. That said, implementation details are very important. As we have seen, ensuring that DBMS servers live up to the responsibility to store transactions safely is a little harder than one might think.

1. MySQL. MySQL with InnoDB engine is generally a good choice. It has stable SQL APIs and a wide range of capable client libraries. However, MySQL must be correctly configured to maintain proper transactional guarantees. Here are three properties that should be in your my.cnf file to help ensure MySQL lives up to its responsibilities:

# Ensure durable flush to storage on transaction commit.

innodb_flush_log_at_trx_commit=1

# Synchronize binlog with committed transactions.

sync_binlog=1

# Use InnoDB as default storage engine. (Unnecessary for MySQL 5.5 and above.)

default-table-type=InnoDB

There are a variety of good backup mechanisms for MySQL databases, including mysqldump (with --single-transaction, useful only for small data sets), Percona XtraBackup, and file system snapshots. Snapshots are especially good when using NetApp or other capable storage. NetApp snapshots can be restored in seconds and cost little in terms of performance overhead.

2. PostgreSQL. PostgreSQL with a trigger-based replication log, for example from Londiste or SLONY, and pgdump for backups is another good choice. PostgreSQL has unusually good trigger support for DML changes at least, and permits users to encode them in a number of languages. Be aware the PostgreSQL triggers do not capture DDL statements like CREATE TABLE, though.

PostgreSQL is fully transactional out of the box and triggers create a fully serialized replication log. It does not have the problem that MySQL does with potentially unsafe table types like MyISAM. However, you need to set a couple of parameters to ensure safe operation. These ensure transactions are committed down to the storage level and prevent catastrophic corruption of the database and/or the WAL (write-ahead log). Since PostgreSQL defaults to these values, you mostly need to avoid turning them off.

fsync = on # turns forced synchronization on or off
synchronous_commit = on # immediate fsync at commit

Like MySQL, PostgreSQL SQL and APIs are stable and well-known. Pgdump also loads and restores data without difficulty for smallish data sets. For larger data sets file system snapshots work very well.

Regardless of the DBMS type you choose, it is important to avoid application-level features that introduce wormholes, such as the PostgreSQL unlogged tables mentioned in the previous section. Generally speaking, you should only skip transactions if there is a very strong reason for doing so.

Do other database types work for this design pattern? Of course. You can also use a commercial DBMS like Oracle. Oracle fulfills the pattern responsibilities quite well, but is a bit more heavyweight than users want, particular when operating in the cloud.

And Hardware Implementation, Too...

Even with a properly configured DBMS server you are still not completely out of the woods for data durability. Database servers generally use an fsync() or similar system call to flush data to storage. Unfortunately storage controller cards may just cache the data to be written in local RAM and return. That can fool the DBMS server into thinking transactions are safely stored when they actually are still sitting in memory on a controller card. The "committed" data will then vaporize in a host crash, which in turn can fatally corrupt both MySQL and PostgreSQL stores if you are very unlucky. Just a few bad blocks can cause very serious problems.

Fortunately there is a cure to make data vaporization less likely. On local storage you can invest in RAID with a battery-backed cache (BBU), which keeps power on for the cache even if the host fails completely. SANs and network attached storage tend to have this capability built in. (But check the specifications!) Battery backed cache also tends to be fast, since controllers can safely return from an fsync() operation as soon as the data to be written are in the on-board cache. Without this feature writes to storage can be painfully slow.

One interesting question is how to handle cloud environments like Amazon. You just do not know how the storage actually works. (That's really the point of a cloud, after all.) Amazon provides SLAs for performance (example: EBS provisioned IOPS), but there do not seem to be any SLAs about storage consistency. There is a lot to learn here and lessons that apply to Amazon may not necessarily apply to others. I suspect this will prompt some rethinking about data consistency--it's an interesting "what if" for transaction processing to suppose you cannot trust the underlying storage capabilities.

Data loss occurs sooner or later in virtually all systems, but the good new is that you can make it uncommon. For more information check out data sources like this and this. Also, other fabric design patterns like the Fault-Tolerant Data Service keep applications running when failures do occur and can even minimize the effects of data loss. See the upcoming article on that design pattern for more information.

Implementations to Avoid

Here are two examples that do not meet the transactional data service design pattern responsibilities or at least not fully.

1. MySQL with MyISAM table type. MyISAM does not support transactions and is not crash safe. You will lose data or incur downtime fixing problems. MyISAM does not belong in data fabrics.

2. PostgreSQL with streaming replication. Streaming replication replicates log updates in real-time and has the added benefit of permitting queries on replicas. However, streaming replication does not allow you to write to replicates. It therefore does not support online schema maintenance or multi-master replication. It also does not help with heterogeneous replication. Streaming replication is therefore an unattractive choice, even though it is far simpler and works better for ensuring high availability than logical replication solutions like SLONY.

How do NoSQL stores fare in this design pattern? Let's pick on MongoDB. MongoDB supports atomic commit to single BSON documents but does not support transactions across multiple documents. (Unless you think that the hacky two-phase commit proposed by the MongoDB manual is an answer.) Atomic transactions are not one of the reasons people tend to choose NoSQL systems, so this is not surprising. It means that MongoDB cannot handle the responsibilities of this design pattern.

Conclusion and Takeaways

The transactional data service design pattern can be implemented with a carefully configured SQL database. As we have seen, however, there are a number of details about what "carefully configured" really means.

It is a good idea to use the transactional data service design pattern even if you are not planning to implement a data fabric. Systems grow. This pattern gives you the flexibility to build out later by adding other fabric design patterns, for example to introduce cross-site operation using the Multi-Site Data Server pattern. It also protects your data at multiple levels that include transactions as well as regular backups. Nobody really likes losing data if it can be avoided.

Another important point: the transactional data service design pattern keeps your entire fabric working. Losing small amounts of data is typically just an inconvenience for users, especially if it does not occur too often. Broken replication on the other hand due to replicas that diverge or become corrupt after failures can lead to time-consuming administration and significant downtime to repair. The fabric is a network of servers. Poor configuration on one server can cause problems for multiple others.

Finally, the biggest error people make with this design pattern is to neglect backups. There's something inherently human about it: backups are painful to test so most of us don't. My first and biggest IT mistake involved a problem with backups. It nearly caused a large medical records company to lose a week of data entry records. Put in backups and test them regularly to ensure they work. This is a theme that will recur in later articles.

Speaking of which, the next article in this series will cover the fabric connector design pattern. Stay tuned!

4 comments:

Unknown said...: "(PostgreSQL) UNLOGGED tables. Such tables are not logged and disappear on restart."

That's not true.

The unlogged tables are truncated only if the server crash or an immediate shutdown is performed, during the instance recovery.; February 7, 2013 at 2:05 AM
Robert Hodges said...: Thank you Frederico! Noted and corrected. Are there other wormholes for PostgreSQL that you know of? PG puts a lot of effort into avoiding this sort of problem so they are harder to find than they are in MySQL.

Further corrections and suggestions for improvement gladly accepted!; February 7, 2013 at 9:58 AM
Marcin said...: The explanation why PostgreSQL with streaming replication should be avoided is not clear for me. Why is it important to be able to write to replicates? Isn't it enought to write to primary server also for online schema changes? Is streaming replication only an unattractive choice or is it as bad as MySQL MyISAM table type for transactional data service?; February 8, 2013 at 6:59 AM
Robert Hodges said...: Great questions! PostgreSQL streaming replication is good for making a carbon copy of your DBMS in another location. The primary use case is high availability and it does this well at least as far as protecting you from failures. The problem is that you need more for large systems.

Imagine you have an upgrade where you need to make changes to data in a 50 million row table where your apps must be off while the change is running. Another example is rolling upgrade to a new PostgreSQL version. To do rolling upgrades with zero downtime, you need to use Londiste or SLONY as described in the following: http://itand.me/zero-downtime-upgrades-of-postgresql-with-pgb. Another non-upgrade use case is multi-master replication.

MyISAM is another kettle of fish entirely. The issue with MyISAM is you can't count on your data coming back after a hard restart of any kind. You can use streaming replication to protect you from local failures, especially if you turn on synchronous replication. For that case it's in some ways a better solution than native MySQL replication. MyISAM by contrast is hazardous to your system's health. That's why I said it does not belong in data fabrics.; February 8, 2013 at 3:40 PM

Feb 6, 2013

Data Fabric Design Patterns: Transactional Data Service

4 comments:

Contributors

Blog Archive

Favorite Blogs