<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-768233104244702633</id><updated>2012-01-16T23:54:01.216-08:00</updated><category term='MongoDB'/><category term='IT Industry'/><category term='SaaS'/><category term='Sun'/><category term='MySQL'/><category term='PostgreSQL'/><category term='Cloud Computing'/><category term='Replication'/><category term='SimpleDB'/><category term='Tungsten'/><category term='Proxies'/><category term='MariaDB'/><category term='Apple'/><category term='Oracle'/><category term='Drizzle'/><category term='Java'/><category term='Vertica'/><category term='NoSQL'/><title type='text'>The Scale-Out Blog</title><subtitle type='html'>Creating robust applications using open source databases and commodity hardware</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>93</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-5915303554604006759</id><published>2012-01-06T15:43:00.000-08:00</published><updated>2012-01-06T16:13:56.210-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='MongoDB'/><category scheme='http://www.blogger.com/atom/ns#' term='Oracle'/><title type='text'>Tungsten on the Beach--LA MySQL Meetup on Jan 11, 2012</title><content type='html'>It is my pleasure to announce that I will be presenting on&amp;nbsp;&lt;a href="http://code.google.com/p/tungsten-replicator/"&gt;Tungsten Replicator&lt;/a&gt; next Wednesday, January 11th at the &lt;a href="http://www.meetup.com/la-mysql/"&gt;Los Angeles MySQL Meetup&lt;/a&gt;. The presentation title is &lt;a href="http://www.meetup.com/la-mysql/events/39574632/"&gt;Fast, Flexible, and Fun--The Tungsten Replicator Magical Mystery Tour&lt;/a&gt;. This talk is going to be fun for two reasons. &lt;br /&gt;&lt;br /&gt;First, it's a great opportunity to meet people in the LA MySQL community and talk about my favorite replication software. Tungsten is like a Swiss Army Knife for data replication. &amp;nbsp;It solves a wide range of problems involving HA, scaling, and data movement. &amp;nbsp; The presentation gives a quick intro to the replicator, then surveys how to use the most interesting features, including parallel slave apply, multi-master replication, transaction filtering, and replicating to MongoDB, Oracle, or data warehouses. &amp;nbsp;I'll even show you how to grab the GPL V2 sources from code.google.com and code up your own replicator extensions using Java or Javascript. &lt;br /&gt;&lt;br /&gt;Second, the talk&amp;nbsp;venue is in Santa Monica about 10 blocks up from the ocean. &amp;nbsp;Who doesn't like beaches? &amp;nbsp; &amp;nbsp;I certainly do. &amp;nbsp;See you next week!&lt;br /&gt;&lt;br /&gt;p.s.,&amp;nbsp;&amp;nbsp;Thanks to Joe Devon and the other LA MySQL Meetup folks for the kind invitation.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-5915303554604006759?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/5915303554604006759/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=5915303554604006759' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/5915303554604006759'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/5915303554604006759'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2012/01/tungsten-on-beach-la-mysql-meetup-on.html' title='Tungsten on the Beach--LA MySQL Meetup on Jan 11, 2012'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-9009715403036994506</id><published>2011-11-18T23:11:00.000-08:00</published><updated>2011-11-19T08:11:34.543-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='Vertica'/><title type='text'>Replicating Data Now and Then with Tungsten</title><content type='html'>What do cruise ship management software and data warehouses have in common? &amp;nbsp;One answer: &amp;nbsp;they both depend on intermittent data replication. &amp;nbsp; Large vessels collect data to share with a home base whenever connectivity permits. &amp;nbsp;If there is no connection, they just wait until later. &amp;nbsp;Data warehouses also do not replicate constantly. &amp;nbsp;Instead, it is often far faster to pool updates and load them in a single humongous batch using SQL COPY commands or native loaders. &amp;nbsp;Replicating updates in this way is sometimes known as &lt;i&gt;batch replication&lt;/i&gt;. &amp;nbsp;&lt;a href="http://code.google.com/p/tungsten-replicator/"&gt;Tungsten Replicator&lt;/a&gt; supports it quite easily.&lt;br /&gt;&lt;br /&gt;To illustrate we will consider a Tungsten master/slave configuration. &amp;nbsp;(Sample setup instructions &lt;a href="http://code.google.com/p/tungsten-replicator/wiki/TRCBasicInstallation#Install_a_master_/_slave_cluster"&gt;here&lt;/a&gt;.) &amp;nbsp;In this example MySQL-based web sales data upload to a data warehouse. &amp;nbsp; The master receives constant updates, which then apply at controlled intervals on the slave. &lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-0_hNHx1Ij7g/TsdBp9h6e7I/AAAAAAAAAJo/A9PIxvsusBY/s1600/master-slave-replication.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="192" src="http://1.bp.blogspot.com/-0_hNHx1Ij7g/TsdBp9h6e7I/AAAAAAAAAJo/A9PIxvsusBY/s320/master-slave-replication.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;The first step is to turn off the slave replicator. &amp;nbsp;Login to the prod2 host and execute the following command. &lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;$ trepctl offline&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The prod2 slave will disconnect from the master as well as the data warehouse. &amp;nbsp;Updates now accumulate on the master. &amp;nbsp;We can turn on the slave to fetch and apply them all, then go offline again using one of three methods. &amp;nbsp;The first method uses the current sequence number on the master. &amp;nbsp;Here are sample commands to fetch and apply all transactions from the master up to the current master position.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;$ trepctl -host prod1 status |grep appliedLastSeqno&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;appliedLastSeqno &amp;nbsp; &amp;nbsp; &amp;nbsp; : 19600&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;$ trepctl online -seqno 19600&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;$ trepctl wait -state OFFLINE -limit 300&lt;/span&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;As you can see, the first command locates the master sequence number. &amp;nbsp;The second command tells the slave to go online and replicate to sequence number 19600. &amp;nbsp;Finally the third command waits until either slave is back in the offline state or 300 seconds elapse, whichever comes first. &amp;nbsp;This is not strictly necessary for replication but is very handy for scripts, as it eliminates a potentially awkward polling loop. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The second method is to use the MySQL binlog position on the master. &amp;nbsp;The idea is the same as the previous example. &amp;nbsp;We get the master binlog position, then tell the slave to apply transactions to that point and go offline. &amp;nbsp;Here's an example:&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;$ mysql -utungsten -psecret -hprod1 -e 'show master status'&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;+------------------+----------+--------------+------------------+&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;| File &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; | Position | Binlog_Do_DB | Binlog_Ignore_DB |&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;+------------------+----------+--------------+------------------+&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;| mysql-bin.002023 |&amp;nbsp;92395851&amp;nbsp;| &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;|&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;+------------------+----------+--------------+------------------+&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;$ trepctl online -event mysql-bin.002023:&lt;b&gt;00000000&lt;/b&gt;92395851&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;$ trepctl wait -state OFFLINE -limit 300&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Note in this example that you must pad the binlog offset out to 16 digits, which means you must add the extra zeros shown in &lt;b&gt;bold&lt;/b&gt;. &amp;nbsp;Tungsten compares native replication IDs as strings, so that we can handle other databases besides MySQL. &amp;nbsp;This normally a minor inconvenience, unless you don't know the trick. &amp;nbsp;In that case it could be a bit of a head-scratcher. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There is a final way to implement batch replication using Tungsten's built-in heartbeat mechanism. &amp;nbsp; With this method we insert a named heartbeat event on the master, then ask the slave to replicate until the heartbeat appears. &amp;nbsp;Here's an example:&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;$ trepctl -host prod1 heartbeat -name batch1&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;$ trepctl online -heartbeat batch1&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;$ trepctl wait -state OFFLINE -limit 300&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This method is perhaps the simplest of all, because there is no need to check for either sequence numbers or binlog offsets on the master. &amp;nbsp;The only downside is that you must have a master and a slave replicator to use it. &amp;nbsp;It does not work with direct replication, in which a single replicator moves data from the master DBMS to the slave. &amp;nbsp;(This limitation will be removed in the future when &lt;a href="http://code.google.com/p/tungsten-replicator/issues/detail?id=228"&gt;Issue 228&lt;/a&gt; is fixed.)&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;When using any of these techniques, we may want to know whether Tungsten will really go offline at the correct point. &amp;nbsp;Fortunately, there's a simple way to find out. &amp;nbsp;The trepctl status command shows pending requests to go offline. &amp;nbsp;Let's say you check status after requesting the slave to replicate to a heartbeat as in the previous example. &amp;nbsp;&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;$ trepctl status&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;Processing status command...&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;NAME &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; VALUE&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;---- &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; -----&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;appliedLastEventId &amp;nbsp; &amp;nbsp; : mysql-bin.002023:0000000104369615;37978&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;appliedLastSeqno &amp;nbsp; &amp;nbsp; &amp;nbsp; : 220126&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;appliedLatency &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; : 470.589&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;...&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;offlineRequests &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;: Offline at heartbeat event: batch1&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;...&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;state &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;: ONLINE&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;timeInStateSeconds &amp;nbsp; &amp;nbsp; : 2.436&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;uptimeSeconds &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;: 1742.0&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;Finished status command...&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It is simple to see from the status output that Tungsten will go offline when it sees a heartbeat named batch1. &lt;br /&gt;&lt;br /&gt;As this article shows, the &lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;trepctl online&lt;/span&gt; and &lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;trepctl wait&lt;/span&gt; commands make it very simple to implement batch replication. &amp;nbsp;You can simplify still further by wrapping the commands in a short script written in your favorite scripting language. &amp;nbsp; Either way you have a handy solution to a problem that affects a diverse set of applications.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;This is not the end of Tungsten features to enable batch replication. &amp;nbsp;Tungsten has a new applier that can submit transactions using CSV files, which is critical to load transactions quickly into data warehouses. &amp;nbsp;We have been testing it out with &lt;a href="http://www.vertica.com/"&gt;Vertica&lt;/a&gt;, where early results show that it improves load performance by a factor of 100 or more in some cases. &amp;nbsp;I will describe this new feature in an upcoming article. &amp;nbsp;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-9009715403036994506?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/9009715403036994506/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=9009715403036994506' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/9009715403036994506'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/9009715403036994506'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/11/replicating-data-now-and-then-with.html' title='Replicating Data Now and Then with Tungsten'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-0_hNHx1Ij7g/TsdBp9h6e7I/AAAAAAAAAJo/A9PIxvsusBY/s72-c/master-slave-replication.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-3699467983366335118</id><published>2011-11-16T23:02:00.000-08:00</published><updated>2011-11-16T23:22:18.015-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Tungsten'/><title type='text'>Why So Many Proprietary Rewrites of MySQL and InnoDB?</title><content type='html'>&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;Every couple of weeks or so I get marketing email from a Continuent competitor advertising a closed-source clone of MySQL. It is said to be pumped up on illegal substances and therefore the solution to all my problems. &amp;nbsp;I like this sort of spam because it makes it easier to track what the neighbors are up to. &amp;nbsp;However it does bring up a question. &amp;nbsp;Why are so many companies offering what amount to proprietary replacements of MySQL? &amp;nbsp;This does not mean alternative builds like &lt;/span&gt;&lt;a href="http://www.percona.com/software/percona-server/"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;Percona&lt;/span&gt;&lt;/a&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt; or &lt;/span&gt;&lt;a href="http://mariadb.org/"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;MariaDB&lt;/span&gt;&lt;/a&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;. &amp;nbsp;It means products like &lt;/span&gt;&lt;a href="http://www.clustrix.com/"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;Clustrix&lt;/span&gt;&lt;/a&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;,&amp;nbsp;&lt;/span&gt;&lt;a href="http://www.schoonerinfotech.com/"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;Schooner&lt;/span&gt;&lt;/a&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;, or &lt;/span&gt;&lt;a href="http://www.xeround.com/"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;Xeround&lt;/span&gt;&lt;/a&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;, which replace MySQL entirely, or like &lt;/span&gt;&lt;a href="http://www.scaledb.com/"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;ScaleDB&lt;/span&gt;&lt;/a&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;, or&amp;nbsp;&lt;/span&gt;&lt;a href="http://www.tokutek.com/"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;Tokutek&lt;/span&gt;&lt;/a&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;,&amp;nbsp;which replace &lt;/span&gt;&lt;a href="http://www.innodb.com/"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;InnoDB&lt;/span&gt;&lt;/a&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;. &amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;There's nothing wrong with proprietary software, of course. &amp;nbsp;And there is nothing wrong with rewriting things to make them better. &amp;nbsp;The rewrites are a tribute to the vitality of the MySQL marketplace and in some cases quite clever as well. &amp;nbsp;However, the proprietary offerings tend to obscure an important truth about MySQL. &amp;nbsp;Most businesses that run on open source software have problems with MySQL management, not with MySQL itself. &lt;br /&gt;&lt;br /&gt;Here is a simple example. &amp;nbsp;Say you have 2 Terabytes in MySQL 5.1. &amp;nbsp;How do you upgrade from MySQL 5.1 to 5.5 without incurring an application outage? &amp;nbsp;This is a big problem for 24x7 web-facing applications. &amp;nbsp;You don't need to rewrite MySQL to do zero-downtime upgrades. &amp;nbsp;MySQL with InnoDB already works fine. &amp;nbsp;You just need a way to shift connections transparently to a new master database, upgrade the old master, and shift back when you are done. &amp;nbsp;Similar reasoning applies for slave provisioning, automated failover, spreading load over replicas to improve performance, or operating across multiple sites. &lt;br /&gt;&lt;br /&gt;At &lt;a href="http://www.continuent.com/"&gt;Continuent&lt;/a&gt; we concluded a number of years ago that you don't need to change MySQL to manage data effectively. &amp;nbsp;We therefore designed &lt;a href="http://www.continuent.com/solutions/overview"&gt;Tungsten Enterprise&lt;/a&gt;, Continuent's commercial clustering solution, to work with unaltered MySQL. Tungsten Enterprise uses master/slave replication (i.e., &lt;a href="http://code.google.com/p/tungsten-replicator/"&gt;my favorite replicator&lt;/a&gt;), distributed management, and transparent connectivity to make a set of standard MySQL or PostgreSQL servers look like a single highly available DBMS that distributes load across all replicas. &amp;nbsp; This architecture has tremendous advantages, because it complements the strengths of MySQL itself. &amp;nbsp; Here are a few of the principle benefits. &lt;br /&gt;&lt;br /&gt;&lt;b&gt;Transparency&lt;/b&gt;. &amp;nbsp;Everything runs on standard MySQL from initial development to large-scale deployment. &amp;nbsp;Application code runs the same way on a dev laptop or production. &amp;nbsp;Application bugs in production are reproducible on the laptop. &amp;nbsp;Standard MySQL configuration and tuning also work, because this &lt;u&gt;is&lt;/u&gt; standard MySQL. &lt;br /&gt;&lt;br /&gt;&lt;b&gt;No lock-in&lt;/b&gt;. &amp;nbsp;Don't like Tungsten Enterprise? &amp;nbsp;Use something else or revert back to simple MySQL. &amp;nbsp;There's no need to change your database or migrate data. &lt;br /&gt;&lt;br /&gt;&lt;b&gt;Data integrity&lt;/b&gt;. &amp;nbsp;InnoDB has had years to shake out bugs, especially those involving data corruption. &amp;nbsp;There are still a few but they do not typically show up unless there is a bad hardware failure or you configure your system incorrectly. &amp;nbsp;(Hint #1: don't use MyISAM.) &amp;nbsp;Do you really want to give this up for a new store implementation? &lt;br /&gt;&lt;br /&gt;&lt;b&gt;Scalability&lt;/b&gt;. &amp;nbsp;MySQL performance is improving constantly, driven by competition between builds, an active community, investment from Oracle and large web properties like Facebook. &amp;nbsp;SSDs are also increasingly affordable and make a lot of performance problems evaporate. &amp;nbsp;As MySQL improves in this and other areas, you get the benefits. &amp;nbsp;The trick is to have a way to upgrade. &amp;nbsp;I mentioned the MySQL 5.1 to 5.5 upgrade problem for precisely this reason. &lt;br /&gt;&lt;br /&gt;&lt;b&gt;Operational costs&lt;/b&gt;. &amp;nbsp;There is a deep pool of administrators and administrative tools for MySQL. &amp;nbsp;Thanks to books like &lt;a href="http://shop.oreilly.com/product/9780596101718.do"&gt;High Performance MySQL&lt;/a&gt;, abundant talks, and a wealth of community resources as well as consulting, there is little mystery about how things work. &amp;nbsp;I probably don't even need to discuss license costs. &lt;br /&gt;&lt;br /&gt;&lt;b&gt;Viability&lt;/b&gt;. &amp;nbsp;MySQL is not going anywhere. &amp;nbsp;Oracle is continuing to invest in the core database, and Percona, MariaDB and most important Microsoft will ensure Oracle stays on its toes. &amp;nbsp;At Continuent we do our best to keep our friends at Oracle competitive on replication. &amp;nbsp;Innovation on open source MySQL will continue for years to come. &amp;nbsp;(Psst, MySQL guys at Oracle are welcome to come work for us. :)&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Given the number of advantages that off-the-shelf MySQL confers, the real question is why our approach is not more popular. &amp;nbsp;Actually it is. &amp;nbsp;For all the marketing attention generated by proprietary MySQL or InnoDB rewrites, many hundreds of billions of transactions per day run on unaltered MySQL. &amp;nbsp;Switching to proprietary versions of MySQL is a substantial wrench for most businesses, because the economics run so strongly in favor of open source DBMS. &amp;nbsp; However, the open source tools for managing MySQL are by-and-large inadequate, in part because some of the problems &lt;a href="http://www.xaprb.com/blog/2011/05/04/whats-wrong-with-mmm/"&gt;turn out to be rather difficult to solve&lt;/a&gt;. &amp;nbsp;&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;When we started to rethink database clustering at Continuent back in 2007, we therefore focused on solving the problems outside MySQL that make data management hard. &amp;nbsp;That includes building fast replication with global transaction IDs, so you can fail over easily to up-to-date live replicas. &amp;nbsp;It includes building distributed, rule-based management that has simple primitives like "recover" to fix a broken slave. &amp;nbsp;It includes speedy, transparent connectivity that can spread reads intelligently across multiple servers and reroute connections transparently to allow maintenance without halting applications. &amp;nbsp;Finally, it includes simplifying management so that users don't spend much time worrying about their data. &amp;nbsp;These capabilities are now very robust and help customers handle hundreds of millions of transactions per day. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It is obvious off-the-shelf MySQL (and PostgreSQL too) are already very good and continuing to get better. &amp;nbsp;For most users there is no need to migrate to proprietary offerings that give up the leverage conferred by open source databases. &amp;nbsp;Tungsten Enterprise solves the difficult problems that are critical to building businesses on standard MySQL. &amp;nbsp; If you are building new systems based on MySQL or scaling old ones you should look hard at what we have done. &amp;nbsp;&amp;nbsp;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-3699467983366335118?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/3699467983366335118/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=3699467983366335118' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/3699467983366335118'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/3699467983366335118'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/11/why-so-many-proprietary-rewrites-of.html' title='Why So Many Proprietary Rewrites of MySQL and InnoDB?'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-4712321214305007205</id><published>2011-11-13T20:33:00.000-08:00</published><updated>2011-11-13T23:01:10.484-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='PostgreSQL'/><category scheme='http://www.blogger.com/atom/ns#' term='NoSQL'/><category scheme='http://www.blogger.com/atom/ns#' term='IT Industry'/><title type='text'>I Really Dislike Anonymous Attacks</title><content type='html'>If you are interested in NoSQL databases (or maybe not) perhaps you have seen the &lt;a href="http://pastebin.com/raw.php?i=FD3xe6Jt"&gt;anonymous "warning" about using MongoDB&lt;/a&gt;. &amp;nbsp; It concludes with the following pious request: &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp;&lt;span class="Apple-style-span" style="font-family: monospace; white-space: pre-wrap;"&gt;Please take this warning seriously.&lt;/span&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now there are a lot of great resources about data management on the web but the aforementioned rant is not one of them. &amp;nbsp;If you plan to write technical articles and have people take them seriously, here are a few tips. &lt;br /&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;&lt;b&gt;Sign your name&lt;/b&gt;. &amp;nbsp;Readers are more impressed when they see you are not afraid to stand behind your words.&amp;nbsp;&lt;/li&gt;&lt;li&gt;&lt;b&gt;Explain what problem you were trying to solve&lt;/b&gt;. &amp;nbsp;Otherwise uncharitable readers might think you just started pumping information into a new database without thinking about possible consequences and now want to blame somebody else for your bad decision. &amp;nbsp;&lt;/li&gt;&lt;li&gt;&lt;b&gt;Explain how you could do better&lt;/b&gt;. &amp;nbsp;Not all designs work out, so propose alternatives. &amp;nbsp;Readers love to see authors demonstrate that they are not discouraged by adversity. &amp;nbsp;&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;As for most of the points made by the anonymous author, all I can say is:&amp;nbsp;&lt;b&gt;well,&amp;nbsp;&lt;/b&gt;&lt;b&gt;&lt;u&gt;&lt;i&gt;duh&lt;/i&gt;&lt;/u&gt;!&lt;/b&gt;&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;MongoDB behavior with respect to&amp;nbsp;&lt;a href="http://www.mongodb.org/display/DOCS/How+does+concurrency+work"&gt;global write locking&lt;/a&gt; and &lt;a href="http://www.mongodb.org/display/DOCS/Durability+and+Repair"&gt;transaction durability&lt;/a&gt;&amp;nbsp;is obvious from the official documentation. &amp;nbsp;These features are not my cup of tea, but it's also not as if 10gen is hiding them either. &amp;nbsp;Moreover, most people&amp;nbsp;understand that new DBMS implementations have problems, not least of all losing data now and then. &amp;nbsp;You usually pick them because they have features that make it worth putting up with the immaturity. &amp;nbsp;I am not an expert on MongoDB, but I can say from experience it is amazingly easy to load JSON objects into it. &amp;nbsp;The up-front usability alone demonstrates excellent engineering. &amp;nbsp;I am sure for this reason that there are many other good features. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;p.s., Here is a &lt;a href="http://news.ycombinator.com/item?id=3202081"&gt;point-by-point response from 10gen&lt;/a&gt;, helpfully &lt;a href="http://nosql.mypopescu.com/post/12466059249/anonymous-post-dont-use-mongodb"&gt;pointed out by Alex Popescu&lt;/a&gt;.&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-4712321214305007205?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/4712321214305007205/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=4712321214305007205' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/4712321214305007205'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/4712321214305007205'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/11/i-really-dislike-anonymous-attacks.html' title='I Really Dislike Anonymous Attacks'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-3286869890190902065</id><published>2011-10-31T07:40:00.000-07:00</published><updated>2011-10-31T11:39:06.497-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='Tungsten'/><title type='text'>Benchmarking Tungsten Parallel Replication</title><content type='html'>Tungsten parallel apply on slaves, or parallel replication for short, has been available for about a year. &amp;nbsp; Until recently we did not have many formal benchmarks of its performance. &amp;nbsp;Fortunately the&amp;nbsp;excellent &lt;a href="http://www.percona.com/live/london-2011/"&gt;Percona Live Conference in London&lt;/a&gt; accepted my talk on Tungsten parallel replication (slides &lt;a href="https://s3.amazonaws.com/extras.continuent.com/Tungsten-Parallel-Replication-in-5-Minutes-Final.pdf"&gt;available here&lt;/a&gt;), so &lt;a href="http://datacharmer.blogspot.com/"&gt;Giuseppe Maxia&lt;/a&gt; and I finally allocated a block of time for systematic performance testing. &lt;br /&gt;&lt;br /&gt;In a nutshell, the results were quite good.&amp;nbsp;In the best cases Tungsten parallel apply out-performs single-threaded native replication by about 4.5 to 1. &amp;nbsp;Both Giuseppe and I have verified this using slightly different test methodologies, which helps avoid dumb counting mistakes. &amp;nbsp;Our results also match field tests at a customer site over the previous summer, so we regard them as fairly robust. &amp;nbsp;In the remainder of this article I would like to expand a bit on the details of the benchmarks as well as the results. &amp;nbsp;The results shown here are from my tests. &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;b&gt;Benchmark Test Design&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Both Giuseppe and I used a similar testbed for replication testing: &amp;nbsp; &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;HP Proliant server, dual Xeon L5520 CPUs with hyper-threading enabled, 72Gb of RAM&lt;/li&gt;&lt;li&gt;1TB HP Smart Array RAID 1+0&amp;nbsp;&lt;/li&gt;&lt;li&gt;Centos 5.6&lt;/li&gt;&lt;li&gt;XFS file system&lt;/li&gt;&lt;li&gt;MySQL 5.1.57 with InnoDB buffer pool set to 10Gb and using O_DIRECT purge method&amp;nbsp;&lt;/li&gt;&lt;li&gt;Tungsten Replicator 2.0.5 build 347 &amp;nbsp;&lt;/li&gt;&lt;/ul&gt;For convenience we use &lt;a href="http://mysqlsandbox.net/"&gt;MySQL sandbox&lt;/a&gt; to set up a master with two slaves, as shown in the following diagram. &amp;nbsp;It turns out that for measuring replication throughput there is no reason to set up on separate hosts, as the master does little or nothing during the test and we only operate one slave at a time. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-ch5MZhTfs2Q/Tq5BOY78QuI/AAAAAAAAAJQ/h7wUqHOPlk0/s1600/parallel-rep-benchmark.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="236" src="http://2.bp.blogspot.com/-ch5MZhTfs2Q/Tq5BOY78QuI/AAAAAAAAAJQ/h7wUqHOPlk0/s400/parallel-rep-benchmark.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The Tungsten Slave is configured as described in a &lt;a href="http://scale-out-blog.blogspot.com/2011/08/adding-parallel-replication-to-mysql-in.html"&gt;previous article in this blog&lt;/a&gt;, except that there are 30 channels instead of 10. &amp;nbsp; The exact installation command is given at the end of this article.&lt;br /&gt;&lt;br /&gt;The test run uses &lt;a href="http://sysbench.sourceforge.net/"&gt;sysbench&lt;/a&gt; to spread transactions evenly across 30 databases of identical size, then measure time to process them. &amp;nbsp;This is also known as a replication catch-up test.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;Load all MySQL servers with an identical dataset consisting of 30 databases pre-populated with data from sysbench. &amp;nbsp;Giuseppe wrote a new tool called &lt;a href="http://code.google.com/p/tungsten-toolbox/wiki/Large_Data_generator"&gt;Large Data Generator&lt;/a&gt; that is very helpful for capturing and loading such datasets. &amp;nbsp;&lt;/li&gt;&lt;li&gt;With the slaves shut down, store the master binlog start position and then run 30 sysbench oltp test processes against the master to update and read from all schemas simultaneously for one hour. &amp;nbsp;&lt;/li&gt;&lt;li&gt;Start the MySQL slave from the stored master binlog position and measure time to process the sysbench transactions. Shut down the MySQL slave at the end of the test.&amp;nbsp;&lt;/li&gt;&lt;li&gt;Start the Tungsten slave from the stored master binlog position and measure time to process the sysbench transactions using Tungsten Replicator with 30 channels (i.e. threads).&amp;nbsp;&lt;/li&gt;&lt;/ol&gt;&lt;b&gt;Test Results&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Database performance is substantially different depending on whether data are fully resident in the buffer pool (cache-resident) or largely read from disk (I/O-bound). &amp;nbsp;Tungsten parallel replication over 30 databases varies from 1.8 to 4.5 depending on which case you look at, as shown in the following table. &amp;nbsp; Processing times are in minutes (m). &lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Helvetica; font-size: x-large;"&gt;&lt;span class="Apple-style-span" style="font-family: Times;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Helvetica; font-size: x-large;"&gt;&lt;span class="Apple-style-span" style="font-family: Times;"&gt;&lt;table cellpadding="0" cellspacing="0" style="border-collapse: collapse;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="border-color: #cbcbcb #cbcbcb #cbcbcb #cbcbcb; border-style: solid; border-width: 1.0px 1.0px 1.0px 1.0px; padding: 0.0px 5.0px 0.0px 5.0px;" valign="middle"&gt;&lt;div style="font: 18.0px Arial; margin: 0.0px 0.0px 0.0px 27.0px; text-indent: -27.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"&gt;&lt;b&gt;Test Scenario&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border-color: #cbcbcb #cbcbcb #cbcbcb #cbcbcb; border-style: solid; border-width: 1.0px 1.0px 1.0px 1.0px; padding: 0.0px 5.0px 0.0px 5.0px;" valign="middle"&gt;&lt;div style="font: 18.0px Arial; margin: 0.0px 0.0px 0.0px 27.0px; text-indent: -27.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"&gt;&lt;b&gt;Rows/Db&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border-color: #cbcbcb #cbcbcb #cbcbcb #cbcbcb; border-style: solid; border-width: 1.0px 1.0px 1.0px 1.0px; padding: 0.0px 5.0px 0.0px 5.0px;" valign="middle"&gt;&lt;div style="font: 18.0px Arial; margin: 0.0px 0.0px 0.0px 27.0px; text-indent: -27.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"&gt;&lt;b&gt;Data Size&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border-color: #cbcbcb #cbcbcb #cbcbcb #cbcbcb; border-style: solid; border-width: 1.0px 1.0px 1.0px 1.0px; padding: 0.0px 5.0px 0.0px 5.0px;" valign="middle"&gt;&lt;div style="font: 18.0px Arial; margin: 0.0px 0.0px 0.0px 27.0px; text-indent: -27.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"&gt;&lt;b&gt;MySQL Slave&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border-color: #cbcbcb #cbcbcb #cbcbcb #cbcbcb; border-style: solid; border-width: 1.0px 1.0px 1.0px 1.0px; padding: 0.0px 5.0px 0.0px 5.0px;" valign="middle"&gt;&lt;div style="font: 18.0px Arial; margin: 0.0px 0.0px 0.0px 27.0px; text-indent: -27.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"&gt;&lt;b&gt;Tungsten Slave&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border-color: #cbcbcb #cbcbcb #cbcbcb #cbcbcb; border-style: solid; border-width: 1.0px 1.0px 1.0px 1.0px; padding: 0.0px 5.0px 0.0px 5.0px;" valign="middle"&gt;&lt;div style="font: 18.0px Arial; margin: 0.0px 0.0px 0.0px 27.0px; text-indent: -27.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"&gt;&lt;b&gt;Ratio&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="border-color: #cbcbcb #cbcbcb #cbcbcb #cbcbcb; border-style: solid; border-width: 1.0px 1.0px 1.0px 1.0px; padding: 0.0px 5.0px 0.0px 5.0px;" valign="middle"&gt;&lt;div style="font: 18.0px Arial; margin: 0.0px 0.0px 0.0px 27.0px; text-indent: -27.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"&gt;Cache-resident&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border-color: #cbcbcb #cbcbcb #cbcbcb #cbcbcb; border-style: solid; border-width: 1.0px 1.0px 1.0px 1.0px; padding: 0.0px 5.0px 0.0px 5.0px;" valign="middle"&gt;&lt;div style="font: 18.0px Arial; margin: 0.0px 0.0px 0.0px 27.0px; text-indent: -27.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"&gt;10K&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border-color: #cbcbcb #cbcbcb #cbcbcb #cbcbcb; border-style: solid; border-width: 1.0px 1.0px 1.0px 1.0px; padding: 0.0px 5.0px 0.0px 5.0px;" valign="middle"&gt;&lt;div style="font: 18.0px Arial; margin: 0.0px 0.0px 0.0px 27.0px; text-indent: -27.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"&gt;430Mb&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border-color: #cbcbcb #cbcbcb #cbcbcb #cbcbcb; border-style: solid; border-width: 1.0px 1.0px 1.0px 1.0px; padding: 0.0px 5.0px 0.0px 5.0px;" valign="middle"&gt;&lt;div style="font: 18.0px Arial; margin: 0.0px 0.0px 0.0px 27.0px; text-indent: -27.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"&gt;30m&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border-color: #cbcbcb #cbcbcb #cbcbcb #cbcbcb; border-style: solid; border-width: 1.0px 1.0px 1.0px 1.0px; padding: 0.0px 5.0px 0.0px 5.0px;" valign="middle"&gt;&lt;div style="font: 18.0px Arial; margin: 0.0px 0.0px 0.0px 27.0px; text-indent: -27.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"&gt;17m&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border-color: #cbcbcb #cbcbcb #cbcbcb #cbcbcb; border-style: solid; border-width: 1.0px 1.0px 1.0px 1.0px; padding: 0.0px 5.0px 0.0px 5.0px;" valign="middle"&gt;&lt;div style="font: 18.0px Arial; margin: 0.0px 0.0px 0.0px 27.0px; text-indent: -27.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"&gt;1.8&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="border-color: #cbcbcb #cbcbcb #cbcbcb #cbcbcb; border-style: solid; border-width: 1.0px 1.0px 1.0px 1.0px; padding: 0.0px 5.0px 0.0px 5.0px;" valign="middle"&gt;&lt;div style="font: 18.0px Arial; margin: 0.0px 0.0px 0.0px 27.0px; text-indent: -27.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"&gt;I/O-Bound&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border-color: #cbcbcb #cbcbcb #cbcbcb #cbcbcb; border-style: solid; border-width: 1.0px 1.0px 1.0px 1.0px; padding: 0.0px 5.0px 0.0px 5.0px;" valign="middle"&gt;&lt;div style="font: 18.0px Arial; margin: 0.0px 0.0px 0.0px 27.0px; text-indent: -27.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"&gt;10M&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border-color: #cbcbcb #cbcbcb #cbcbcb #cbcbcb; border-style: solid; border-width: 1.0px 1.0px 1.0px 1.0px; padding: 0.0px 5.0px 0.0px 5.0px;" valign="middle"&gt;&lt;div style="font: 18.0px Arial; margin: 0.0px 0.0px 0.0px 27.0px; text-indent: -27.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"&gt;68Gb&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border-color: #cbcbcb #cbcbcb #cbcbcb #cbcbcb; border-style: solid; border-width: 1.0px 1.0px 1.0px 1.0px; padding: 0.0px 5.0px 0.0px 5.0px;" valign="middle"&gt;&lt;div style="font: 18.0px Arial; margin: 0.0px 0.0px 0.0px 27.0px; text-indent: -27.0px;"&gt;&lt;span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;228m&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border-color: #cbcbcb #cbcbcb #cbcbcb #cbcbcb; border-style: solid; border-width: 1.0px 1.0px 1.0px 1.0px; padding: 0.0px 5.0px 0.0px 5.0px;" valign="middle"&gt;&lt;div style="font: 18.0px Arial; margin: 0.0px 0.0px 0.0px 27.0px; text-indent: -27.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"&gt;51m&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border-color: #cbcbcb #cbcbcb #cbcbcb #cbcbcb; border-style: solid; border-width: 1.0px 1.0px 1.0px 1.0px; padding: 0.0px 5.0px 0.0px 5.0px;" valign="middle"&gt;&lt;div style="font: 18.0px Arial; margin: 0.0px 0.0px 0.0px 27.0px; text-indent: -27.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"&gt;4.5&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Let's look at the results in detail. &amp;nbsp;In the cache-resident test the base dataset is relatively small and loads fully into the buffer cache within a minute or two. &amp;nbsp;Both MySQL and Tungsten slaves complete in well under an hour. &amp;nbsp;Here is a graph showing throughput as measured in bytes of binlog processed per 10 second increment. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-mxYXKRE3IP0/Tq5I2QIpgoI/AAAAAAAAAJg/CxPjdSrP1ro/s1600/Cache-Resident-30-Channel-TR-vs-MySQL.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="273" src="http://4.bp.blogspot.com/-mxYXKRE3IP0/Tq5I2QIpgoI/AAAAAAAAAJg/CxPjdSrP1ro/s400/Cache-Resident-30-Channel-TR-vs-MySQL.jpg" width="400" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Cache-Resident Slave Catch-Up - MySQL vs. Tungsten Replicator, 30 Databases&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;In the cache-resident case there are virtually no reads from disk as all data are fully resident in the InnoDB buffer pool. &amp;nbsp;Tungsten Replicator is faster because multiple writes can occur in parallel but the speed-up versus native replication is not especially large. &amp;nbsp;Note that Tungsten processes around 40Mb every 10 seconds or about 1Gb of binlog every four minutes. &lt;br /&gt;&lt;br /&gt;With I/O bound workloads, on the other hand, we see a profound difference in performance. &amp;nbsp;Tungsten Replicator is at least 6x slower than in the cache-resident case, but still processes updates faster than the master (51 minutes on the slave vs. 60 minutes on the master). &amp;nbsp; Buffer cache loading is correspondingly fast and Tungsten reaches steady-state performance within about 20 minutes. &amp;nbsp;MySQL native replication on the other hand is far slower. The slave not only does not catch up, but it would quickly lag far behind &amp;nbsp;the master under this workload. &amp;nbsp;It takes about 90 minutes for native replication even to achieve steady state performance after buffer pool loading.&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-3re8IjvI654/Tq5I13Fx3pI/AAAAAAAAAJY/2fRXZQEza0A/s1600/IO-Bound-30-Channel-TR-vs-MySQL.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="272" src="http://4.bp.blogspot.com/-3re8IjvI654/Tq5I13Fx3pI/AAAAAAAAAJY/2fRXZQEza0A/s400/IO-Bound-30-Channel-TR-vs-MySQL.jpg" width="400" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;I/O-Bound Slave Catch-Up - MySQL vs. Tungsten Replicator, 30 Databases&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-weight: normal;"&gt;Overall we can safely say that single-threaded native replication is likely non-workable in the I/O-bound case without going to some combination of SSDs and/or slave pre-fetch. &amp;nbsp;&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-weight: normal;"&gt;&lt;/span&gt;&lt;/b&gt;&lt;b&gt;Further Improvements and Caveats&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The current results of parallel replication benchmarks on Tungsten are gratifying especially when you consider that two years ago Tungsten Replicator performance was around 10% of the speed of MySQL replication. &amp;nbsp;Nevertheless, these benchmarks are not the final word. &amp;nbsp;It is clear there is room for optimization as we observe that&amp;nbsp;Tungsten processes&amp;nbsp;the cache-bound binlog at least 6 times faster than the I/O bound workload. &amp;nbsp;Much of the difference seems to be time spent reading from disk. &amp;nbsp; If this could be improved, Tungsten would go even faster.&lt;br /&gt;&lt;br /&gt;During the London conference Yoshinori Matsunobu published some &lt;a href="http://yoshinorimatsunobu.blogspot.com/2011/10/making-slave-pre-fetching-work-better.html"&gt;excellent performance results using slave pre-fetch&lt;/a&gt;, which has encouraged us to build pre-fetch into Tungsten as well. &amp;nbsp;&amp;nbsp;I am curious to see if we can further boost throughput by adding pre-fetching on each parallel thread, though other people at the conference such as Domas Mituzas were not optimistic. &amp;nbsp;Either way, I am certain we will improve performance, if not using pre-fetch then with other tricks like batching inserts.&lt;br /&gt;&lt;br /&gt;Finally, some caveats. &amp;nbsp;Our sysbench load is nice because it is evenly distributed across schemas of exactly the same size. &amp;nbsp;Most application workloads do not behave this way, though some do come very close. &amp;nbsp;The slides for my talk discuss practical issues in maximizing performance in real applications. &amp;nbsp;I suspect that a combination of parallelization with pre-fetch will in fact turn out to be a very good solution for a wide variety of workloads. &lt;br /&gt;&lt;br /&gt;&lt;b&gt;Fine Print&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;If you would like to repeat our results (or attack them as fraudulent), here are some parameters that may help. &amp;nbsp;The database settings in the MySQL sandbox instances are as follows:&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;default-storage-engine=InnoDB&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;innodb-additional-mem-pool-size=100M&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;innodb-flush-method=O_DIRECT&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;innodb-log-buffer-size=4M&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;innodb-log-file-size=50M&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;innodb-thread-concurrency=0&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;innodb_buffer_pool_size=10G&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;innodb_file_format=barracuda&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;innodb_file_per_table=1&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;innodb_flush_log_at_trx_commit=2&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;innodb_strict_mode=1log-bin=mysql-binmax-connections=500&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;max_allowed_packet=48M&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;skip_slave_start&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;sync_binlog=0&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Next, here is the sysbench command used to generate load on each schema. &amp;nbsp;We run 30 of these simultaneously varying the database name for each invocation. &amp;nbsp;This example is for the I/O-bound case.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;sysbench --test=oltp --db-driver=mysql --mysql-db=${db} \ &amp;nbsp;--mysql-user=msandbox --mysql-password=msandbox \&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--mysql-host=127.0.0.1 --mysql-port=33306 \&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--oltp-read-only=off --oltp-table-size=10000000 \&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--oltp-index-updates=4 --oltp-non-index-updates=2 \&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--max-requests=200000 \&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--max-time=3600 --num-threads=5 run&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The replicator configuration is given in the slides for the talk, but here it is again. &amp;nbsp; Options in &lt;span class="Apple-style-span" style="color: red;"&gt;&lt;b&gt;red&lt;/b&gt;&lt;/span&gt; are required for sandboxes. &amp;nbsp;Production installations are therefore simpler than what is shown here.&lt;br /&gt;&lt;br /&gt;&lt;div style="color: #333233; font: 16.0px 'Courier New'; line-height: 20.0px; margin: 0.0px 0.0px 0.0px 0.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;tools/tungsten-installer tools/tungsten-installer --direct -a \&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="color: #333233; font: 16.0px 'Courier New'; line-height: 20.0px; margin: 0.0px 0.0px 0.0px 0.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; --service-name=parallel --native-slave-takeover \&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="color: #333233; font: 16.0px 'Courier New'; line-height: 20.0px; margin: 0.0px 0.0px 0.0px 0.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; --master-host=127.0.0.1 &lt;/span&gt;&lt;/span&gt;&lt;span style="color: #ff0021;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;--master-port=33306&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; \&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="color: #333233; font: 16.0px 'Courier New'; line-height: 20.0px; margin: 0.0px 0.0px 0.0px 0.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; --master-user=msandbox --master-password=msandbox&amp;nbsp; \&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="color: #333233; font: 16.0px 'Courier New'; line-height: 20.0px; margin: 0.0px 0.0px 0.0px 0.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; --slave-host=127.0.0.1 &lt;/span&gt;&lt;/span&gt;&lt;span style="color: #ff0021;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;--slave-port=33307&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; \&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="color: #333233; font: 16.0px 'Courier New'; line-height: 20.0px; margin: 0.0px 0.0px 0.0px 0.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; --slave-user=msandbox --slave-password=msandbox&amp;nbsp; \&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="color: #333233; font: 16.0px 'Courier New'; line-height: 20.0px; margin: 0.0px 0.0px 0.0px 0.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; --home-directory=/opt/continuent \&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="color: #333233; font: 16.0px 'Courier New'; line-height: 20.0px; margin: 0.0px 0.0px 0.0px 0.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; --property=replicator.store.parallel-queue.maxOfflineInterval=5 \&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="color: #333233; font: 16.0px 'Courier New'; line-height: 20.0px; margin: 0.0px 0.0px 0.0px 0.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; --svc-parallelization-type=disk --buffer-size=100 \&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="color: #ff0021; font: 16.0px 'Courier New'; line-height: 20.0px; margin: 0.0px 0.0px 0.0px 0.0px;"&gt;&lt;span style="color: black;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;--channels=30 --thl-port=2115 --rmi-port=10010 \&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="color: #ff0021; font: 16.0px 'Courier New'; line-height: 20.0px; margin: 0.0px 0.0px 0.0px 0.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; --skip-validation-check=MySQLPermissionsCheck \&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="color: #ff0021; font: 16.0px 'Courier New'; line-height: 20.0px; margin: 0.0px 0.0px 0.0px 0.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; --skip-validation-check=MySQLApplierServerIDCheck \&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="color: #333233; font: 16.0px 'Courier New'; line-height: 20.0px; margin: 0.0px 0.0px 0.0px 0.0px;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; --start-and-report&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;To equal the results shown above you will also need to assign databases explicitly to channels in the shard.list file. &amp;nbsp; Otherwise, databases will be assigned &amp;nbsp;channels using a hashing function, which tends to result in somewhat uneven distributions. &amp;nbsp;Look in the comments of the shard.list file for instructions on how to do this.&amp;nbsp;&lt;br /&gt;&lt;br /&gt;Finally, all of our tests depend on two excellent tools from Giuseppe Maxia: &amp;nbsp;&lt;a href="http://mysqlsandbox.net/"&gt;MySQL Sandbox&lt;/a&gt; and the new Large Data Generator program in the &lt;a href="http://code.google.com/p/tungsten-toolbox/"&gt;Tungsten Toolbox&lt;/a&gt;. &amp;nbsp;Once you get the hang of them you will become completely addicted as they make test setup both reliable as well as quick. &amp;nbsp;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-3286869890190902065?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/3286869890190902065/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=3286869890190902065' title='12 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/3286869890190902065'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/3286869890190902065'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/10/benchmarking-tungsten-parallel.html' title='Benchmarking Tungsten Parallel Replication'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-ch5MZhTfs2Q/Tq5BOY78QuI/AAAAAAAAAJQ/h7wUqHOPlk0/s72-c/parallel-rep-benchmark.jpg' height='72' width='72'/><thr:total>12</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-183325842346489353</id><published>2011-10-01T12:06:00.000-07:00</published><updated>2011-10-01T13:28:59.496-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='PostgreSQL'/><category scheme='http://www.blogger.com/atom/ns#' term='IT Industry'/><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Computing'/><title type='text'>Open Source Hardware</title><content type='html'>Back in 2010 I stopped buying test servers from Dell and began building them from components using Intel i7 processors, X58-based mother boards, and modular power supplies from Ultra. &amp;nbsp;It was a good way to learn about hardware. &amp;nbsp;Besides, it was getting old to pay for Dell desktop systems with Windows, which I would then wipe off when installing Linux. &amp;nbsp;Between the educational value of understanding the systems better, selecting the exact components I wanted, and being able to fix problems quickly, it has been one of the best investments I have ever made. &amp;nbsp;And it didn't cost any more than equivalent Dell servers.&lt;br /&gt;&lt;br /&gt;For this reason, a couple of recent articles about computer hardware caught my attention. &amp;nbsp;First, &lt;a href="http://www.bloomberg.com/news/2011-09-12/dell-loses-orders-as-facebook-do-it-yourself-servers-gain-tech.html"&gt;Dell is losing business as companies like Facebook build their own customized servers&lt;/a&gt;. &amp;nbsp;Open source database performance experts like &lt;a href="http://www.mysqlperformanceblog.com/about/"&gt;Peter Zaitsev&lt;/a&gt; have been talking about full-stack optimization including hardware for years. &amp;nbsp;Google built their original servers using off-the-shelf parts. &amp;nbsp;Vertical integration of applications and hardware has since gone mainstream. &amp;nbsp;If you deploy the same application(s) on many machines, balancing characteristics like cost, performance, and power utilization is no longer a specialist activity but a necessity of business. &amp;nbsp;It's not just cutting out the Microsoft tax but many other optimizations as well. &lt;br /&gt;&lt;br /&gt;Second, developments in hardware itself are making custom systems more attractive to a wide range of users. &amp;nbsp;A &lt;a href="http://www.bunniestudios.com/blog/?p=1863"&gt;recent blog post by Bunnie Huang&lt;/a&gt; describes how decreases in the slope of CPU clock speed increase over time mean you can get better cost/performance by building optimized, application-specific systems now than waiting for across-the-board improvements. &amp;nbsp;Stable standards also drive down the difficulty of rolling your own. &amp;nbsp;Components on mid-range servers are sufficiently standardized it is easier to build basic systems from components than to put together a bicycle from scratch. &amp;nbsp;Try building your own wheels sometime if you don't believe this. &lt;br /&gt;&lt;div&gt;&lt;br /&gt;Easily customizable hardware has important consequences. &amp;nbsp;At a business level, Dell and other mainline hardware vendors will adapt to lower margins, but the market for generic, mid-range appliances has evaporated. &amp;nbsp;Starting around 2005 there was a wave of companies trying to sell open source databases, memcached, and datamarts on custom hardware. &amp;nbsp; Most seem to have moved away from hardware, &lt;a href="http://www.dbms2.com/2011/01/28/schooner-software-onl/"&gt;like Schooner&lt;/a&gt;, &amp;nbsp;or folded entirely (like &lt;a href="http://gigaom.com/2010/04/26/gear6-rip/"&gt;Gear6&lt;/a&gt;&amp;nbsp;and &lt;a href="http://www.infoworld.com/d/the-industry-standard/teradata-buys-analytics-vendor-kickfire-496"&gt;Kickfire&lt;/a&gt;). &amp;nbsp;The long-term market for such appliances, to the extent it exists, is in the cloud. &lt;br /&gt;&lt;br /&gt;The other consequence is potentially far more significant. &amp;nbsp;The traditional walls that encapsulated hardware and software design are breaking down. &amp;nbsp;Big web properties or large ISPs like Rackspace run lean design teams that integrate hardware with open source software deployment. &amp;nbsp;This not just a matter of software engineers learning about hardware or vice-versa. &amp;nbsp;It is the tip of a much bigger iceberg. &amp;nbsp;Facebook recently started the &lt;a href="http://opencompute.org/"&gt;Open Compute Project&lt;/a&gt;, which is a community-based effort to design server infrastructure. &amp;nbsp; In their own words:&lt;br /&gt;&lt;blockquote&gt;&lt;i&gt;By releasing Open Compute Project technologies as open hardware, our goal is to develop servers and data centers following the model traditionally associated with open source software projects. That’s where you come in.&lt;/i&gt;&lt;/blockquote&gt;Facebook and others are opening up data center design. &amp;nbsp;Gamers have been building their own systems for years. &amp;nbsp;Assuming Bunnie's logic is correct, open hardware will apply to wide range of devices from phones up to massive clusters. &amp;nbsp;Community-based, customized system designs are no longer an oddity but part of a broad movement that will change the way all of us think about building and deploying applications on any kind of physical hardware. &amp;nbsp;It will upset current companies but also create opportunities for new kinds of businesses. &amp;nbsp;The "cloud" is not the only revolution in computing. &amp;nbsp;Open source hardware has arrived. &amp;nbsp;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-183325842346489353?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/183325842346489353/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=183325842346489353' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/183325842346489353'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/183325842346489353'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/10/open-source-hardware.html' title='Open Source Hardware'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-8881432212802028854</id><published>2011-09-29T21:57:00.000-07:00</published><updated>2011-09-29T21:58:17.001-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='MongoDB'/><category scheme='http://www.blogger.com/atom/ns#' term='NoSQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Tungsten'/><title type='text'>Quick Installation of Replication from MySQL to MongoDB</title><content type='html'>Proof-of-concept &lt;a href="http://code.google.com/p/tungsten-replicator/"&gt;Tungsten&lt;/a&gt; support for&amp;nbsp;&lt;a href="http://www.mongodb.org/"&gt;MongoDB&lt;/a&gt; arrived last May, when I posted about our &lt;a href="http://scale-out-blog.blogspot.com/2011/05/introducing-mysql-to-mongodb.html"&gt;hackathon effort to replicate from MySQL to MongoDB&lt;/a&gt;. &amp;nbsp;That code then lay fallow for a few months while we worked on other things like parallel replication, but the period of idleness has ended. &amp;nbsp;Earlier this week I checked in fixes to Tungsten Replicator to add &lt;a href="http://code.google.com/p/tungsten-replicator/issues/detail?id=229"&gt;one-line installation support for MongoDB slaves&lt;/a&gt;.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;MySQL to MongoDB replication will be officially supported in the Tungsten Replicator 2.0.5 build, which will be available in a few weeks. &amp;nbsp;However, you can try out MySQL to MongoDB replication right now. &amp;nbsp;Here is a quick how-to using my lab hosts logos1 for the MySQL master and logos2 for the MongoDB slave.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;1. Download the latest development build of Tungsten Replicator. &amp;nbsp; See the &lt;a href="http://s3.amazonaws.com/files.continuent.com/builds/nightly/tungsten-2.0-snapshots/index.html"&gt;nightly builds page&lt;/a&gt; for S3 URLs.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;$ cd /tmp&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;$ wget --no-check-certificate https://s3.amazonaws.com/files.continuent.com/builds/nightly/tungsten-2.0-snapshots/tungsten-replicator-2.0.5-332.tar.gz&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;2. Untar and cd into the release.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;$ tar -xzf tungsten-replicator-2.0.5-332.tar.gz&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;$ cd tungsten-replicator-2.0.5-332&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;3. Install a MySQL master replicator on a host that has MySQL installed and is configured to use row replication, i.e. &lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;binlog_format=row&lt;/span&gt;. &amp;nbsp;Note that you need to enable the colnames and pkey filters. &amp;nbsp;These add column names to row updates and eliminate update and delete query columns other than those corresponding to the primary key, respectively. Last but not least, ensure strings are converted to Unicode rather than transported as raw bytes, which we have to do in homogeneous MySQL replication to finesse character set issues. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;$ tools/tungsten-installer --master-slave -a \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--datasource-type=mysql \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--master-host=logos1 &amp;nbsp;\&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--datasource-user=tungsten &amp;nbsp;\&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--datasource-password=secret &amp;nbsp;\&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--service-name=mongodb \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--home-directory=/opt/continuent \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--cluster-hosts=logos1 \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--mysql-use-bytes-for-string=false \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--svc-extractor-filters=colnames,pkey \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--svc-parallelization-type=disk --start-and-report&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;4. Finally, install a MongoDB slave. &amp;nbsp;Before you do this, ensure mongod 1.8.x is up and running on the host as described in the &lt;a href="http://scale-out-blog.blogspot.com/2011/05/introducing-mysql-to-mongodb.html"&gt;original blog post&lt;/a&gt; on MySQL to MongoDB replication. &amp;nbsp; My mongod is running on the default port of 27017, so there is no --slave-port option necessary.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;$ tools/tungsten-installer --master-slave -a \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--datasource-type=mongodb \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--master-host=logos1 &amp;nbsp;\&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--datasource-user=tungsten &amp;nbsp;\&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--datasource-password=secret &amp;nbsp;\&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--service-name=mongodb \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--home-directory=/opt/continuent \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--cluster-hosts=logos2 \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--skip-validation-check=InstallerMasterSlaveCheck \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--svc-parallelization-type=disk --start-and-report&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;That's it. &amp;nbsp;You test replication by logging into MySQL on the master, adding a row to a table, and confirming it reaches the slave. &amp;nbsp; First the SQL commands:&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;$ mysql -utungsten -psecret -hlogos1 test&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;Welcome to the MySQL monitor. &amp;nbsp;Commands end with ; or \g.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;...&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;mysql&amp;gt; create table bar(id1 int primary key, data varchar(30));&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;Query OK, 0 rows affected (0.15 sec)&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;mysql&amp;gt; insert into bar values(1, 'hello from mysql');&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;Query OK, 1 row affected (0.00 sec)&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now check the contents of MongoDB: &amp;nbsp;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;$ mongo logos2:27017/test&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;MongoDB shell version: 1.8.3&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;connecting to: logos2:27017/test&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;system.indexes&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; db.bar.find()&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;{ "_id" : ObjectId("4e85269484aef8fcae4b0010"), "id1" : "1", "data" : "hello from mysql" }&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;Voila! &amp;nbsp;We may still have bugs, but at least MySQL to MongoDB replication is now easy to install. &amp;nbsp;&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Speaking of bugs, I have been fixing problems as they pop up in testing. &amp;nbsp;The most significant improvement is a feature I call auto-indexing on MongoDB slaves. &amp;nbsp;MongoDB materializes collections automatically when you put in the first update, but it does nothing about indexes. &amp;nbsp;My first TPC-B runs processed less than 100 transactions per second on the MongoDB slave, which is pretty pathetic.&amp;nbsp;The bottleneck is due to MongoDB update operations of the form 'db.account.findAndModify(myquery,mydoc)'. &amp;nbsp;You must index properties used in the query or things will be very slow. &amp;nbsp;&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Auto-indexing cures the update bottleneck by ensuring that there is an index corresponding to the SQL primary key for any table that we update. &amp;nbsp;MongoDB makes this logic very easy to implement--you can issue a command like 'db.account.ensureIndex({account_id:1})' to create an index. &amp;nbsp;What's really cool is that MongoDB will do this even if the collection is not yet materialized--e.g., before you load data. &amp;nbsp; It seems to be another example of how MongoDB collections materialize whenever you refer to them, which is a very useful feature. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;TPC-B updates into MongoDB are now running at over 1000 transactions per second on my test hosts. I plan to fix more bugs and goose up performance still further over the next few weeks. &amp;nbsp;Through MongoDB we are unlearning assumptions within Tungsten that are necessary to work with non-relational databases. &amp;nbsp;It's great preparation for big game hunting next year: &amp;nbsp;replication to &lt;a href="http://hbase.apache.org/"&gt;HBase&lt;/a&gt; and &lt;a href="http://cassandra.apache.org/"&gt;Cassandra&lt;/a&gt;. &amp;nbsp;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-8881432212802028854?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/8881432212802028854/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=8881432212802028854' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/8881432212802028854'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/8881432212802028854'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/09/quick-installation-of-replication-from.html' title='Quick Installation of Replication from MySQL to MongoDB'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-9092965487743804441</id><published>2011-09-08T12:15:00.000-07:00</published><updated>2011-09-08T14:44:04.718-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='MongoDB'/><category scheme='http://www.blogger.com/atom/ns#' term='PostgreSQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Oracle'/><category scheme='http://www.blogger.com/atom/ns#' term='Tungsten'/><title type='text'>What's Next for Tungsten Replicator</title><content type='html'>As Giuseppe Maxia &lt;a href="http://datacharmer.blogspot.com/2011/09/tungsten-replicator-204-released.html"&gt;recently posted&lt;/a&gt; we released &lt;a href="http://code.google.com/p/tungsten-replicator/downloads/list"&gt;Tungsten Replicator 2.0.4&lt;/a&gt; this week. &amp;nbsp;It has a raft of bug fixes and new features of which one-line installations are the single biggest improvement. &amp;nbsp;I set up replicators dozens of times a day and having a single command for standard cluster topologies is a huge step forward. &amp;nbsp;Kudos to Jeff Mace for getting this nailed down. &lt;br /&gt;&lt;br /&gt;So what's next? &amp;nbsp;You can get see what we are up to in general by looking at our &lt;a href="http://code.google.com/p/tungsten-replicator/issues/list"&gt;issues list&lt;/a&gt;. &amp;nbsp;We cannot do everything at once, but here are the current priorities for Tungsten Replicator 2.0.5.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Parallel replication speed and robustness. &amp;nbsp;I'm currently working on eliminating choke points in performance (like &lt;a href="http://code.google.com/p/tungsten-replicator/issues/detail?id=168"&gt;this one&lt;/a&gt;) as well as eliminating corner cases that cause the replicator to require manual intervention, such as &lt;a href="http://code.google.com/p/tungsten-replicator/issues/detail?id=148"&gt;aging out logs that are still needed by slaves&lt;/a&gt;. &amp;nbsp;&lt;/li&gt;&lt;li&gt;Multi-master replication. &amp;nbsp;This includes better support for &lt;a href="http://scale-out-blog.blogspot.com/2011/08/system-of-record-approach-to-multi.html"&gt;system of record architectures&lt;/a&gt;, many masters to one slave, and replication between the same databases on different sites. &amp;nbsp;Stephane Giron nailed a &lt;a href="http://code.google.com/p/tungsten-replicator/issues/detail?id=209"&gt;key MyISAM multi-master bug&lt;/a&gt; for the last release. &amp;nbsp;We will continue to polish this as we work through our current projects. &amp;nbsp;&amp;nbsp;&lt;/li&gt;&lt;li&gt;Better installations for more types of databases. &amp;nbsp;Jeff recently hacked in support for PostgreSQL as well as Oracle slaves, and we are contemplating addition of MongoDB support. &amp;nbsp;Heterogeneous replication is getting simpler to set up. &amp;nbsp;&lt;/li&gt;&lt;li&gt;Filter usability. &amp;nbsp;Giuseppe has &lt;a href="http://code.google.com/p/tungsten-replicator/wiki/Replicator_filter_infrastructure_and_interface"&gt;a list of improvements for filters&lt;/a&gt;, which are one of the most powerful Tungsten Replicator features but not as easy for non-developers to use as we would like. &amp;nbsp;Better installation support is first on the list followed by ability to load and unload dynamically. &amp;nbsp;&lt;/li&gt;&lt;li&gt;Data warehouse loading. &amp;nbsp;We have a design for fast data warehouse loading that I hope we'll be able to implement in the next few weeks. &amp;nbsp;&lt;a href="http://flyingclusters.blogspot.com/"&gt;Linas Virbalas &lt;/a&gt;has also been working on this problem along with a number of other heterogeneous projects for customers. &amp;nbsp;&lt;/li&gt;&lt;/ul&gt;This is a lot of work and not everything will necessarily be finished when 2.0.5 goes out. &amp;nbsp;However, I hope we'll make progress on all of them. &amp;nbsp;In case you are wondering how we pick things, replicator development is largely driven by customer projects. &amp;nbsp;&amp;nbsp;If you have something you need in the replicator, please &lt;a href="http://www.continuent.com/"&gt;contact Continuent&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;After this build we will... Er, let's get 2.0.5 done first. &amp;nbsp;Suffice it to say we have a long list of useful and interesting features to discuss in future blog articles.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-9092965487743804441?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/9092965487743804441/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=9092965487743804441' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/9092965487743804441'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/9092965487743804441'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/09/whats-next-for-tungsten-replicator.html' title='What&apos;s Next for Tungsten Replicator'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-6737798747290329829</id><published>2011-09-06T21:56:00.000-07:00</published><updated>2011-09-10T11:33:08.557-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='IT Industry'/><category scheme='http://www.blogger.com/atom/ns#' term='Apple'/><title type='text'>The Inimitable Mr. Steven Jobs</title><content type='html'>There have been countless articles praising Steve Jobs since &lt;a href="http://online.wsj.com/article/SB10001424053111904875404576528981250892702.html"&gt;he announced his retirement from Apple on August 25th&lt;/a&gt;. &amp;nbsp;Most either &lt;a href="http://www.nytimes.com/2011/08/27/opinion/nocera-what-makes-steve-jobs-great.html"&gt;catalogue Steve Job's many triumphs&lt;/a&gt; or&lt;a href="http://online.wsj.com/article/SB10001424053111904787404576530362000438254.html"&gt; assess the impact of his creativity on society&lt;/a&gt;. &amp;nbsp;Those are entertaining topics but not especially useful. &amp;nbsp;A more practical question is why Steve Jobs is so good at creating new products and whether the rest of us can imitate him.&lt;br /&gt;&lt;br /&gt;Steve Job's best work seems to follow a repeated pattern. &amp;nbsp;Let's call it the Apple pattern, though of course it could just as well be the Pixar pattern or Next pattern:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;See the whole picture of some crucial human/technology interaction and recognize gaps. &amp;nbsp;&lt;/li&gt;&lt;li&gt;Design products to fill those gaps that combine artistic sensibility and innovative technology.&lt;/li&gt;&lt;li&gt;Get a large organization to implement designs in a way that makes the end result like the handiwork of a single highly-focused craftsman.&amp;nbsp;&lt;/li&gt;&lt;/ol&gt;&lt;ul&gt;&lt;/ul&gt;Two things about the pattern seem particularly striking. &amp;nbsp;First, Steve Jobs is a complete package. &amp;nbsp;I have been in the tech industry for over three decades and have met people who did one or at most two of these things at the level necessary to create products that move large markets. &amp;nbsp;Almost nobody does all three. &amp;nbsp;The fact that Steve is excellent in all areas simultaneously may be a root cause behind his long run of successes. &lt;br /&gt;&lt;br /&gt;Second, Job's ability to drive implementation teams is extraordinary. &amp;nbsp;Maybe it's just the manager in me, but I find his ability to pick the right people to run teams and to keep those teams pointed in a clear direction without product-destroying compromises quite remarkable. &amp;nbsp;This is far harder than generating ideas in the first place. &amp;nbsp;The heart of the Apple pattern as as much about understanding people as technology--not just users but the creators as well. &amp;nbsp;I have never heard Jobs make pronouncements on team management, but there is an &lt;a href="http://www.youtube.com/watch?v=k2h2lvhzMDc"&gt;excellent talk from Ed Catmull of Pixar&lt;/a&gt; that summarizes the tensions quite well. &lt;br /&gt;&lt;br /&gt;Steve Jobs is commonly compared to great inventors like Edison, Ford, and Disney. &amp;nbsp;When thinking about imitation, another parallel seems more illuminating: &lt;a href="http://en.wikipedia.org/wiki/John_Churchill,_1st_Duke_of_Marlborough"&gt;&amp;nbsp;John Churchill, Duke of Marlborough&lt;/a&gt; and hands-down the greatest English general of all time. &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-YQKjjcE_fng/TmbPga4j8qI/AAAAAAAAAI8/WNNaav6i5Og/s1600/John_Churchill_Marlborough_portra%25CC%2588tterad_av_Adriaen_van_der_Werff_%25281659-1722%2529.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="320" src="http://4.bp.blogspot.com/-YQKjjcE_fng/TmbPga4j8qI/AAAAAAAAAI8/WNNaav6i5Og/s320/John_Churchill_Marlborough_portra%25CC%2588tterad_av_Adriaen_van_der_Werff_%25281659-1722%2529.jpg" width="261" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;A possible Jobs ancestor?&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Marlborough possessed a seldom equalled ability to see war as an integrated whole across geography and branches of arms, devise unexpected strategies to exploit the weaknesses of his enemies, and execute them flawlessly in the difficult conditions of early 18th Century campaigns. &amp;nbsp;Execution extended from handling fractious allies down to the painstaking work to ensure his men had proper meals after each day's march. &amp;nbsp;In other words: &amp;nbsp;analogous problem-solving abilities to Steve Jobs, translated into the field of warfare. &amp;nbsp; The parallel extends to the lavish praise of contemporaries and later historians. &amp;nbsp;Winston Churchill famously described Marlborough as follows. &amp;nbsp;&lt;/div&gt;&lt;blockquote&gt;He commanded the armies of Europe against France for ten campaigns. He fought four great battles and many important actions&amp;nbsp;... He never fought a battle that he did not win, nor besieged a fortress that he did not take&amp;nbsp;... He quitted war invincible.&lt;/blockquote&gt;Grand problem-solvers like Marlborough and Jobs are sufficiently rare they tend to be one-offs who change society but leave no obvious successors. &amp;nbsp;English military superiority on the Continent waned after Marlborough's retirement. &amp;nbsp;Something similar will likely befall Apple after Jobs, current &lt;a href="http://moneywatch.bnet.com/investing/blog/investment-insights/apple-after-steve-jobs-poised-for-years-of-success/2415/"&gt;happy talk&lt;/a&gt; about product pipelines and cash position notwithstanding. &amp;nbsp;It is simply not possible to imitate Jobs by committee, which is effectively what will happen once he is completely absent. &amp;nbsp;The driving force is gone.&lt;br /&gt;&lt;br /&gt;That said, we can all imitate Steve Jobs, albeit on a smaller scale. &amp;nbsp;Many highly successful products start with a single person who conceives the idea and drives at least the first couple of iterations to completion. &amp;nbsp;Seeing the whole problem, applying innovative designs to solve it, and managing the team to get it done is a fundamental pattern that applies across a wide range of endeavors. &amp;nbsp;Here is just one of many examples. &lt;br /&gt;&lt;br /&gt;Many years ago at Sybase I worked for a manager named Mark Deppe. &amp;nbsp;Early in the 1990s Mark learned that Wall Street firms were patching together crude publish/subscribe messaging applications to move data between financial systems in order to speed up trades. &amp;nbsp; He recognized that there was a much better way to do this using log-based data replication and built the &lt;a href="http://www.sybase.com/products/businesscontinuity/replicationserver"&gt;Sybase Replication Server&lt;/a&gt; product. &amp;nbsp;The Rep Server went on to generate hundreds of millions of dollars in sales. &amp;nbsp;It still sells well today, over 15 years later. &amp;nbsp;Mark was a great architect but also a great builder of teams. &amp;nbsp;He paid as much or more attention to hiring and managing people as he did to technology. &amp;nbsp;He trusted the people he hired, and he gave them the freedom and support to do great work. &amp;nbsp;At the same time Mark was also incredibly attentive to detail and did all the project management for the first releases himself. &amp;nbsp;Years later he said it was too important a task to hand off to anyone else.&lt;br /&gt;&lt;br /&gt;Mark Deppe was the best technical manager I ever worked for. &amp;nbsp;I have consciously imitated his best practices for many years. &amp;nbsp;Looking back it seems I was unconsciously imitating the Apple design pattern. &amp;nbsp;But perhaps that was not a complete coincidence. &amp;nbsp; Before joining Sybase Mark was at Apple where he worked with (guess who?) Steve Jobs.&lt;br /&gt;&lt;br /&gt;------------------&lt;br /&gt;NOTE: &amp;nbsp;After this article was published I found the flow hard to understand and edited it a week or so later to make it more readable. &amp;nbsp;The argument is the same as before. &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-6737798747290329829?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/6737798747290329829/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=6737798747290329829' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/6737798747290329829'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/6737798747290329829'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/09/inimitable-mr-steven-jobs.html' title='The Inimitable Mr. Steven Jobs'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-YQKjjcE_fng/TmbPga4j8qI/AAAAAAAAAI8/WNNaav6i5Og/s72-c/John_Churchill_Marlborough_portra%25CC%2588tterad_av_Adriaen_van_der_Werff_%25281659-1722%2529.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-8949677719498934930</id><published>2011-08-30T01:51:00.000-07:00</published><updated>2011-08-31T07:40:19.348-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='SaaS'/><category scheme='http://www.blogger.com/atom/ns#' term='Tungsten'/><title type='text'>Practical Multi-Master Replication using Shard Filters</title><content type='html'>Earlier this month I published an article on this blog describing the &lt;a href="http://scale-out-blog.blogspot.com/2011/08/system-of-record-approach-to-multi.html"&gt;system of record approach to multi-master replication&lt;/a&gt;. &amp;nbsp;As mentioned in that article my colleagues and I at Continuent have been working on improving Tungsten to make system of record design patterns easier to implement. &amp;nbsp;This article describes how to set up system of record using &lt;a href="http://code.google.com/p/tungsten-replicator/"&gt;Tungsten Replicator&lt;/a&gt; shard filters, which are a new feature in Tungsten 2.0.4. &amp;nbsp;By doing so we will create a multi-master configuration that avoids replication loops and transaction conflicts. &amp;nbsp;On top of that, it is quite easy to set up.&lt;br /&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There are many possible system of record patterns depending on how many schemas are shared and across how many masters. &amp;nbsp;The following diagram shows three of them. &amp;nbsp;In contrast to many so-called MySQL multi-master implementations, all masters are live and accept updates. &amp;nbsp;(By contrast, schemes such as &lt;a href="http://yoshinorimatsunobu.blogspot.com/2011/08/mysql-mha-support-for-multi-master.html"&gt;MySQL-MHA&lt;/a&gt; make extra masters read-only. &amp;nbsp;Don't be fooled!) &amp;nbsp;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-duGJhDfOA4A/Tlu15ygBXXI/AAAAAAAAAI4/GTmlDjP6Q2M/s1600/multi-master-variations.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="264" src="http://2.bp.blogspot.com/-duGJhDfOA4A/Tlu15ygBXXI/AAAAAAAAAI4/GTmlDjP6Q2M/s640/multi-master-variations.jpg" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: auto;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;For today's exercise we will implement the basic system of record. &amp;nbsp;Once you understand this you can quickly set up other multi-master scenarios. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Defining Shard Master Locations&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The first step is to tell Tungsten where each shard is mastered. &amp;nbsp;By mastered we mean it is the one master that receives application updates on that shard, whereas all other masters have copies only or may not even contain the shard at all. &amp;nbsp;Tungsten uses a variant of CSV (comma-separated format) where the first line contains column names. &amp;nbsp;You can have any amount of whitespace between entries. &amp;nbsp;Create a file called shards.map with your favorite editor and type in the following lines.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;b&gt;shard_id&lt;/b&gt;&lt;/span&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;b&gt;	&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;b&gt;master&lt;/b&gt;&lt;/span&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;b&gt;	&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;b&gt;critical&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;tungsten_nyc&lt;/span&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;	&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;nyc&lt;/span&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;	&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;false&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;tungsten_sjc&lt;/span&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;	&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;sjc&lt;/span&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;	&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;false&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;acme&lt;/span&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;	&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; sjc &amp;nbsp; &amp;nbsp; false&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;pinnacle &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;nyc &amp;nbsp; &amp;nbsp; false&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The first column is the name of the shard. &amp;nbsp;This must be unique--because a shard can only live on one master. &amp;nbsp;The next column is the "home" master for the shard. &amp;nbsp;This is the one and only master that should receive shard updates. &amp;nbsp;The third column defines whether the shard is critical and requires full serialization. &amp;nbsp;It will be linked to parallel replication in a later release. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It turns out you do not need to add entries for Tungsten catalog schemas such as tungsten_nyc. &amp;nbsp;Tungsten Replicator will create them automatically. &amp;nbsp;They are shown here for completeness only. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Creating Replication Services&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Next we need to define services to replicate bi-directionally between DBMS servers and set options to filter shards using the &lt;a href="http://code.google.com/p/tungsten-replicator/source/browse/trunk/replicator/src/java/com/continuent/tungsten/replicator/shard/ShardFilter.java"&gt;ShardFilter&lt;/a&gt; class, which is new in Tungsten 2.0.4. &amp;nbsp;The shard filter helps ensure that shards replicate from their home masters only and not from other locations. &amp;nbsp; If you do not know what replication services are, you can find a description of them in &lt;a href="http://scale-out-blog.blogspot.com/2011/03/slouching-towards-multi-master-conflict.html"&gt;this article&lt;/a&gt;. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Multi-master replication is easy to mis-configure, so to prevent accidents we will tell the shard filter to generate an error any time it processes a shard it has never seen before. &amp;nbsp;The replication service will immediately fail, which signals that we have to update shard definitions. &amp;nbsp;This is the safest way to implement system of record or any multi-master configuration for that matter. &amp;nbsp;It is generally easier to restart replication after correcting the configuration than to mix up data, which can lead to major outages. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The first step is to set replication services for each master. &amp;nbsp;These read the binlog and make transactions available to slave replication services. &amp;nbsp;Here are the commands. &amp;nbsp; Note that the sjc master is on host logos1, while the nyc master is on logos2. &amp;nbsp;The remaining examples use these names consistently.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;# Define common master settings.&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;COMMON_MASTER_OPTS="--datasource-user=tungsten --datasource-password=secret \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;--home-directory=/opt/continuent --svc-parallelization-type=disk \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;--svc-extractor-filters=shardfilter \&lt;/span&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;--property=replicator.filter.shardfilter.unknownShardPolicy=error"&lt;/span&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;# Set up sjc master.&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;tools/tungsten-installer --master-slave -a --master-host=logos1 \&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;--cluster-hosts=logos1 --service-name=sjc $COMMON_MASTER_OPTS --start-and-report&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;# Set up nyc master.&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;tools/tungsten-installer --master-slave -a --master-host=logos2 \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;--cluster-hosts=logos2 --service-name=nyc $COMMON_MASTER_OPTS --start-and-report&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The --svc-extractor-filters option adds shard filtering immediately after event extraction. &amp;nbsp;The unknownShardPolicy=error setting will cause the masters to die if they process an undefined shard. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now we can define the remote slave services for sjc and nyc. &amp;nbsp;These are special slaves that write transactions onto another master as opposed to a normal slave. &amp;nbsp;We would like slave services to error out on unknown shards as well. &amp;nbsp;Also (and this is important) we want them to enforce shard homes. &amp;nbsp;Here are the commands to create the services and start each one. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;COMMON_SLAVE_OPTS="--release-directory=/opt/continuent/tungsten \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;--service-type=remote --allow-bidi-unsafe=true --svc-parallelization-type=disk \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;--svc-applier-filters=shardfilter \&lt;/span&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;--property=replicator.filter.shardfilter.unknownShardPolicy=error \&lt;/span&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;--property=replicator.filter.shardfilter.enforceHome=true"&lt;/span&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;# Set up sjc remote slave.&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;tools/configure-service -C -a --host=logos2 \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;--local-service-name=nyc --role=slave \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;--datasource=logos2 --master-host=logos1 $COMMON_SLAVE_OPTS sjc&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;$trepctl -host logos2 -service sjc start&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;# Set up nyc remote slave.&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;tools/configure-service -C -a --host=logos1 \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;--local-service-name=sjc --role=slave \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;--datasource=logos1 --master-host=logos2 $COMMON_SLAVE_OPTS nyc&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;trepctl -host logos1 -service nyc start&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The --svc-applier-filters option adds shard filtering before applying to the DBMS. &amp;nbsp;The unknownShardPolicy=error setting will cause the slaves to die if they process an undefined shard. &amp;nbsp;Finally, the enforceHome=true option means that each slave will drop any transaction that lives on a different service from that slave's master. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;At the end of this procedure, your services should be online and read to run. &amp;nbsp;Use 'trepctl services' to make sure. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Loading Shard Definitions&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;To make shard definitions take effect, you must load the shard.map contents into each replication service. &amp;nbsp;You can do this any time the replicator is running but after loading new definitions you must put the replicator online again. &amp;nbsp;Here are the commands to load the shard maps onto each of the four replication services. &amp;nbsp; For each replication service, you must delete the old definitions, reload new ones, and get the replicator to go online again.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;#!/bin/bash&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;MAP=shard.map&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;for host in logos1 logos2&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;do&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;for service in sjc nyc&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;do&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;trepctl -host $host -service $service shard -deleteAll&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;trepctl -host $host -service $service shard -insert &amp;lt; $MAP&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;trepctl -host $host -service $service offline&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;trepctl -host $host -service $service wait -state OFFLINE&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;trepctl -host $host -service $service online&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;done&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;done&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This looks a little clunky and will be reduced to a single command instead of five in a later release. &amp;nbsp;I put it in a script to make it quicker to run. &amp;nbsp;The good news is that there is just one shard map that works for all replication services, regardless of location or role. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Once you finish this step, you can go to any replication service and list the shards it knows about. &amp;nbsp;Let's pick a service and demonstrate:&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;$ trepctl -host logos1 -service sjc shard -list&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;shard_id&lt;/span&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;	&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;master&lt;/span&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;	&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;critical&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;tungsten_nyc&lt;/span&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;	&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;nyc&lt;/span&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;	&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;false&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;tungsten_sjc&lt;/span&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;	&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;sjc&lt;/span&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;	&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;false&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;acme&lt;/span&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;	&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;sjc&lt;/span&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;	&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;false&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;pinnacle&lt;/span&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;	&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;nyc&lt;/span&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;	&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;false&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;With this we are ready to start processing some transactions.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Multi-Master Operation&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;At this point we have multi-master replication enabled between hosts logos1 and logos2. &amp;nbsp;You can try it out. &amp;nbsp;Let's add the acme database to the sjc master on logos1 as an example.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;mysql -utungsten -psecret -hlogos1&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;mysql&amp;gt;&amp;nbsp;create database acme;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;mysql&amp;gt; use acme&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;mysql&amp;gt; create table foo (id int);&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;mysql&amp;gt; insert into foo values(1);&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;We can see that all of these commands replicate over to the logos2 server quite easily with the following command:&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;mysql -utungsten -psecret -hlogos2 -e 'select * from acme.foo'&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;+------+&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;| id &amp;nbsp; |&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;+------+&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;| &amp;nbsp; &amp;nbsp;1 |&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;+------+&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;That seems pretty simple. &amp;nbsp;In fact it is. &amp;nbsp;You can go over to logos2 and enter transactions for pinnacle in the same way. &amp;nbsp;Data replicate back and forth. &amp;nbsp;There are no replication loops. &amp;nbsp;There are also no conflicts. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;b&gt;Adding a New Shard&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;So what happens when we add a new shard? &amp;nbsp;The simplest way to see is to create a database using a schema name that does not exist in the shard map. &amp;nbsp; Let's try to create a database named superior on the nyc master. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;mysql -utungsten -psecret -hlogos2 -e 'create database superior'&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now check the status of the nyc master replication service. &amp;nbsp;We see it has failed with an error due to the unknown shard. &amp;nbsp; (Tungsten parses the create database command and assigns it the shard ID "superior.")&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;$ trepctl -host logos2 -service nyc status&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;Processing status command...&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;NAME &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; VALUE&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;---- &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; -----&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;...&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;pendingError &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; : Stage task failed: binlog-to-q&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;pendingErrorCode &amp;nbsp; &amp;nbsp; &amp;nbsp; : NONE&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;pendingErrorEventId &amp;nbsp; &amp;nbsp;: mysql-bin.000157:0000000000002475;1287&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;pendingErrorSeqno &amp;nbsp; &amp;nbsp; &amp;nbsp;: 8&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;pendingExceptionMessage: Rejected event from unknown shard: seqno=8 shard ID=superior&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;...&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;state &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;: OFFLINE:ERROR...&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;Finished status command...&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This problem is quite easy to fix. &amp;nbsp;We just open up the shard.map file and add a row for superior so that the file contents look like the following:&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;b&gt;shard_id&lt;/b&gt;&lt;/span&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;b&gt;	&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;b&gt;master&lt;/b&gt;&lt;/span&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;b&gt;	&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;b&gt;critical&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;tungsten_nyc&lt;/span&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;	&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;nyc&lt;/span&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;	&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;false&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;tungsten_sjc&lt;/span&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;	&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;sjc&lt;/span&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;	&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;false&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;acme&lt;/span&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;	&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; sjc &amp;nbsp; &amp;nbsp; false&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;pinnacle &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;nyc &amp;nbsp; &amp;nbsp; false&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;superior &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;nyc &amp;nbsp; &amp;nbsp; false&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Reload the shard.map file as shown previously and you will be back in business. &amp;nbsp;Incidentally, if you do not want the&amp;nbsp;superior&amp;nbsp;database to be replicated to other masters, you can also specify this in the rules. &amp;nbsp;Just give superior the special master name #LOCAL as in the following example and it will not replicate outside the nyc service.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;superior &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;#LOCAL &amp;nbsp;false&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;In fact, #LOCAL means that &lt;i&gt;&lt;u&gt;any&lt;/u&gt;&lt;/i&gt; schema named superior will not replicate outside the service in which it is defined. &amp;nbsp;You can have an unshared schema named superior on every master. &amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Where to Next?&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;The shard support described in this article is now part of Tungsten 2.0.4 and will appear in the official build when it is finally ready. &amp;nbsp;You can try it out right now using one of our &lt;a href="http://s3.amazonaws.com/files.continuent.com/builds/nightly/tungsten-2.0-snapshots/index.html"&gt;handy nightly builds&lt;/a&gt;. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;We plan to build out shard filtering quite a bit from the current base. &amp;nbsp;One immediate fix is to put in a check so that if an application commits shard updates on the wrong DBMS server, the master replication service on that server will detect it and fail. &amp;nbsp;This will tell you there's a problem immediately rather than letting you wallow in blissful ignorance while your data become hopelessly mixed up. &amp;nbsp;We will also simplify the commands to update shards while replicators are online.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Longer term we will be adding features to propagate shard definitions through replication itself. &amp;nbsp;Stay tuned for more work in this area. &amp;nbsp;If you want to help fund work to enable your own applications, please get in contact with me at Continent. &amp;nbsp;I can think of at least a dozen ways to make our multi-master support better but it's always nicer to spend the effort on features that enable real systems. &amp;nbsp;In the meantime, I hope you find multi-master with shard filtering useful and look forward to your feedback. &amp;nbsp;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-8949677719498934930?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/8949677719498934930/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=8949677719498934930' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/8949677719498934930'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/8949677719498934930'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/08/practical-multi-master-replication.html' title='Practical Multi-Master Replication using Shard Filters'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-duGJhDfOA4A/Tlu15ygBXXI/AAAAAAAAAI4/GTmlDjP6Q2M/s72-c/multi-master-variations.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-725765465870062033</id><published>2011-08-24T12:08:00.000-07:00</published><updated>2011-08-24T12:53:49.875-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='Tungsten'/><title type='text'>First the Blog, now the Webinar:  Adding Parallel Replication to MySQL in a Hurry</title><content type='html'>&lt;div&gt;My recent post on &lt;a href="http://scale-out-blog.blogspot.com/2011/08/adding-parallel-replication-to-mysql-in.html"&gt;setting up Tungsten parallel replication in a hurry&lt;/a&gt;&amp;nbsp;got a lot of hits, though to be fair it was probably not the great writing but the fact that at least one &lt;a href="http://www.facebook.com/MySQLatFacebook"&gt;popular MySQL blog&lt;/a&gt; posted a link to it. &amp;nbsp;(Thanks, we like you guys too.) &amp;nbsp;Anyway, I would like to invite anybody who is interested in parallel replication to attend a &lt;a href="https://www1.gotomeeting.com/register/946233905"&gt;webinar on Thursday September 1st at 10am PDT&lt;/a&gt;&amp;nbsp;to cover installing and using Tungsten. &amp;nbsp;It's straight-up technical talk to help you start quickly.&amp;nbsp;&lt;/div&gt;&lt;br /&gt;Bringing up Tungsten on an existing MySQL slave only takes a few minutes, so once we have that out of the way I will explain how Tungsten works inside and show you some of the tricks for getting your applications to play nice with parallel replication as well as how to tune performance. &amp;nbsp;The idea is to minimize fluffy architectural stuff and maximize lab demos that help you bend replication to your will. &amp;nbsp;The talk will also cover how to get help, log bugs, and even add your own code. &amp;nbsp;Plus there will be lots of time for questions. &lt;br /&gt;&lt;br /&gt;As most readers of this blog know, Tungsten Replicator is open source (&lt;a href="http://www.gnu.org/licenses/old-licenses/gpl-2.0.html"&gt;GPL V2&lt;/a&gt;) and &lt;a href="http://code.google.com/p/tungsten-replicator/"&gt;hosted on code.google.com&lt;/a&gt;. &amp;nbsp;If you miss the webinar you may be able to catch up on parallel replication in-person in London toward the end of October. &amp;nbsp;I just submitted a talk to the next &lt;a href="http://www.percona.com/live/london-2011/"&gt;Percona Live&lt;/a&gt; and hope it gets accepted. &amp;nbsp;If so, see you there! &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-725765465870062033?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/725765465870062033/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=725765465870062033' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/725765465870062033'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/725765465870062033'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/08/first-blog-now-webinar-adding-parallel.html' title='First the Blog, now the Webinar:  Adding Parallel Replication to MySQL in a Hurry'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-3086191220848581838</id><published>2011-08-19T12:51:00.000-07:00</published><updated>2011-08-19T12:51:05.073-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='MariaDB'/><category scheme='http://www.blogger.com/atom/ns#' term='SaaS'/><title type='text'>The System of Record Approach to Multi-Master Database Applications</title><content type='html'>Multi-master database systems that span sites are an increasingly common requirement in business applications. &amp;nbsp;Yet the way such applications work in practice is not quite what you would think from accounts of NoSQL systems like &lt;a href="http://en.wikipedia.org/wiki/Apache_Cassandra"&gt;Cassandra&lt;/a&gt; or SQL-based systems like &lt;a href="http://www.oracle.com/technetwork/database/clustering/overview/index.html"&gt;Oracle RAC&lt;/a&gt;. &amp;nbsp;In this article I would like to introduce a versatile design pattern for&amp;nbsp;multi-master&amp;nbsp;SQL applications in which individual schemas are updated in a single location only but may have many copies elsewhere both locally as well as on other sites. &amp;nbsp;This pattern is known as a &lt;a href="http://en.wikipedia.org/wiki/System_of_record"&gt;system of record&lt;/a&gt; architecture. &amp;nbsp;You can build it with off-the-shelf MySQL and master/slave replication. &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Let's start by picking a representative software-as-a-service (SaaS) application: &amp;nbsp;call center automation. &amp;nbsp; &amp;nbsp;Call center software integrates with a local PBX or VOIP to allow agents to answer and make phone calls for telemarketing campaigns in a systematic and automated way using standard procedures known as "agent scripts." Admins set up agent scripts and define lists of people to call as well as marketing campaigns. &amp;nbsp;Finally and perhaps most importantly, managers receive a wide variety of detailed reports that allow them to optimize current work, examine historical performance, and make predictions about the future for planning purposes. &amp;nbsp;Here is a typical application architecture. &lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-x7eoX13mTfM/Tk1iaOhNElI/AAAAAAAAAII/qDLK0rNG1SA/s1600/sor-call-center-automation.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="356" src="http://1.bp.blogspot.com/-x7eoX13mTfM/Tk1iaOhNElI/AAAAAAAAAII/qDLK0rNG1SA/s400/sor-call-center-automation.jpg" width="400" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Figure 1: &amp;nbsp; Call Center Application Architecture&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;Bear in mind that this is a greatly simplified view. &amp;nbsp;Like most business applications, call center automation systems may contain hundreds of database tables and many types of user services. &amp;nbsp;There are also practical complications that go beyond the application itself. &amp;nbsp; &amp;nbsp;Call center automation is vital to the businesses that use it. &amp;nbsp;Customers want assurance&amp;nbsp;they can continue processing on another site if a SaaS vendor site goes dark. &amp;nbsp;This means we have to think about maintaining applications and data on multiple sites.&lt;br /&gt;&lt;br /&gt;The ideal solution for most SaaS vendors would be to have call center data and applications for all customers live on multiple sites at all times. &amp;nbsp;Multiple live sites mean that failover is instantaneous since both applications and database servers are already up and running. &amp;nbsp;Constant update means there is little or no data loss on failure. Customers could connect to the nearest site. &amp;nbsp;Here is a picture of that dream that includes two sites and two customers, Acme Inc. and Pinnacle, Ltd.&lt;br /&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-FsUehzC0aTA/Tk2DVcJpc0I/AAAAAAAAAIY/fpitIExkjWE/s1600/sor-multi-master-dream.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="226" src="http://4.bp.blogspot.com/-FsUehzC0aTA/Tk2DVcJpc0I/AAAAAAAAAIY/fpitIExkjWE/s400/sor-multi-master-dream.jpg" width="400" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Figure 2: &amp;nbsp;Dream Architecture for Call Center Automation&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;This solution has only one problem. &amp;nbsp;It is impossible to build. &amp;nbsp;Readers may nod wisely and say this is because of &lt;a href="http://en.wikipedia.org/wiki/CAP_theorem"&gt;CAP Theorem&lt;/a&gt; limitations, but that somewhat misses the point. &amp;nbsp;Let's say we use a&amp;nbsp;NoSQL DBMS like Cassandra that permits updates in multiple locations and reconciles the data using eventual consistency. &amp;nbsp; However, there's a catch: &amp;nbsp;as we saw above, much of the value of call center automation is in operational queries and reports. &amp;nbsp;That drives you back to an RDBMS with cross-table joins, aggregation functions, referential integrity, and convenient SQL-based report writing tools. &amp;nbsp;For this reason alone, Cassandra is a non-starter for call center automation. &lt;br /&gt;&lt;br /&gt;What about a SQL DBMS? &amp;nbsp;MySQL obviously has all the features you need for query-intensive solutions on smallish data sets (e.g. hundreds of millions of rows, not many billions or trillions). &amp;nbsp;The problem is multi-master replication. &amp;nbsp; Updating the same table from two or more places on a LAN is already quite difficult: &amp;nbsp;witness the complexity of Oracle RAC or MySQL Cluster. &amp;nbsp;The problem becomes intractable when you combine complex SQL transactions, referential integrity, and high-latency WAN connections. &amp;nbsp;If you want full SQL semantics you cannot have updates on multiple sites. &amp;nbsp; This is a serious dilemma and not just for call center automation. &amp;nbsp;The same problems or worse affect a multitude of valuable business applications including market automation, credit card processing, customer relationship management (CRM), time/expense tracking, accounting, and many others. &lt;br /&gt;&lt;br /&gt;Fortunately we are not really stuck. &amp;nbsp;If we give up some requirements customers do not really want anyway, there is a perfectly good solution that will work for a wide range of problems. &amp;nbsp;Data warehousing architects long ago developed the notion of a &lt;i&gt;system of record&lt;/i&gt;. &amp;nbsp;Bill Inmon's classic &lt;a href="http://www.amazon.com/Building-Data-Warehouse-W-Inmon/dp/product-description/0764599445"&gt;&lt;i&gt;Building the Data Warehouse&lt;/i&gt;&lt;/a&gt; defined system of record as follows:&lt;br /&gt;&lt;blockquote&gt;The definitive and singular source of operational data. &amp;nbsp;If data element abc has a value of 25 in a database record but a value of 45 in the system of record, by definition the first value is incorrect and must be reconciled. &amp;nbsp;&lt;/blockquote&gt;System of record applies to multi-master systems in the form of a simple rule. &amp;nbsp;We just assert that every customer has master data in one and only one location and copies everywhere else. &amp;nbsp;When particular customers update information they do so on their own master. &amp;nbsp;Customers can have masters on different hosts or sites, but the system of record rule says that no customer has one in two places. &amp;nbsp;This eliminates conflicts between masters, and multi-master replication now works without a lot of difficulty.&lt;br /&gt;&lt;br /&gt;System of record thus meets the original requirement of having data on multiple sites, which was to handle a site failure. &amp;nbsp;We can store data economically using off-the-shelf MySQL. &amp;nbsp;We can update copies within and across sites using master/slave replication. &amp;nbsp;We can shard customer data into independent schemas. The result looks like the following. &amp;nbsp;Acme has a master in San Jose, whereas Pinnacle has its master in New York.&lt;br /&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-50wGXVucP9I/Tk2l13DEh5I/AAAAAAAAAIo/QIOkELGpAPk/s1600/sor-multi-master-reality.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="208" src="http://1.bp.blogspot.com/-50wGXVucP9I/Tk2l13DEh5I/AAAAAAAAAIo/QIOkELGpAPk/s400/sor-multi-master-reality.jpg" width="400" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;System of Record Architecture for Call Center Automation&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;Using the system of record approach simplifies other problems as well. &amp;nbsp;Standard backup and restore techniques still work. &amp;nbsp;If you mess up a customer copy, you re-provision from the master shard. &amp;nbsp;You can implement failover across sites and also fail over locally onto slave copies, which can be complete copies containing data for all customers.&lt;br /&gt;&lt;br /&gt;Meanwhile, most users are fine with a single site. &amp;nbsp;Pinnacle is close to New York, which is why the SaaS vendor puts Pinnacle's data there and gives them the New York site DNS for login. &amp;nbsp; It is also possible to run reports on the cross-site copies as well. &amp;nbsp;You can even run full applications provided you forward writes to the system of record, as shown above for Acme.&lt;br /&gt;&lt;br /&gt;The real issue in implementing system of record architectures is that existing replication and clustering tools are not quite up to the job of handling cross-site applications build on system of record. &amp;nbsp;We are extending &lt;a href="http://code.google.com/p/tungsten-replicator/"&gt;Tungsten&lt;/a&gt; to handle some of the obvious problems in building these types of systems using MySQL. &lt;br /&gt;&lt;ol&gt;&lt;li&gt;Locating the customer master and connecting applications to it.&amp;nbsp;&lt;/li&gt;&lt;li&gt;Moving the customer master from one location to another. &amp;nbsp;This happens more often than you would think, for example to minimize multi-master replication which can introduce problems beyond conflicts.&amp;nbsp;&lt;/li&gt;&lt;li&gt;Detecting accidental updates to copies and preventing them from either reaching the DBMS and/or preventing them from propagating to other locations. &amp;nbsp;&lt;/li&gt;&lt;li&gt;Proving a clean failover model that works on both cross-site as well as local copies of data.&amp;nbsp;&amp;nbsp;&lt;/li&gt;&lt;li&gt;Recovering corrupted copies of customer data from the master. &amp;nbsp;&lt;/li&gt;&lt;/ol&gt;I will discuss two of the upcoming Tungsten features in follow-up articles. &amp;nbsp;The first is assigning a shard master in the Tungsten Replicator using the new shard API. &amp;nbsp;The shard API enables multi-master but enforces system of record constraints to avoid messing up data should you accidentally update in the wrong location. &amp;nbsp;The second feature is cross-site management and connectivity using Tungsten Enterprise. &amp;nbsp;This handles failovers within and between sites and automatically connects applications to the active master regardless of which site or DBMS it lives on.&lt;br /&gt;&lt;br /&gt;The need for availability is pushing an increasing number of SaaS vendors and other application providers to operate systems across multiple sites. &amp;nbsp;Applications like call center automation depend on the features of SQL and cannot be implemented using NoSQL DBMS's like Cassandra. &amp;nbsp;The&amp;nbsp;system of record architecture eliminates replication conflicts and enables multi-master updates to work on ordinary SQL databases between sites. &amp;nbsp;If you are building complex SQL applications and thinking about going multi-site, this design pattern should be in your toolbox. &amp;nbsp;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-3086191220848581838?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/3086191220848581838/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=3086191220848581838' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/3086191220848581838'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/3086191220848581838'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/08/system-of-record-approach-to-multi.html' title='The System of Record Approach to Multi-Master Database Applications'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-x7eoX13mTfM/Tk1iaOhNElI/AAAAAAAAAII/qDLK0rNG1SA/s72-c/sor-call-center-automation.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-474854520384624313</id><published>2011-08-18T20:38:00.000-07:00</published><updated>2011-08-18T21:37:13.334-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='MariaDB'/><category scheme='http://www.blogger.com/atom/ns#' term='Drizzle'/><title type='text'>So Where's the Fall MySQL Community Conference?</title><content type='html'>Last week Percona announced plans to sponsor the &lt;a href="http://www.mysqlperformanceblog.com/2011/08/09/announcing-percona-live-mysql-conference-and-expo-2012/"&gt;Percona MySQL Conference&lt;/a&gt; in Santa Clara in April 2012. &amp;nbsp;It is meant to replace the O'Reilly conferences of previous years. &amp;nbsp;The announcement led to some reasonable questions, for example &lt;a href="http://datacharmer.blogspot.com/2011/08/call-for-disclosure-on-mysql-conference.html"&gt;from Giuseppe Maxia&lt;/a&gt;. &amp;nbsp;These and &lt;a href="http://rpbouman.blogspot.com/2011/08/regarding-mysql-conference-and-expo.html"&gt;other online posts&lt;/a&gt;&amp;nbsp;initiated a thoughtful exchange of views about the pros and cons of Percona's conference announcement by various members of the MySQL community. &lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-DvPXWc9R_1w/Tk2-Y7CTfGI/AAAAAAAAAIw/MbzjFJfQ1OI/s1600/asterix-fight.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="232" src="http://4.bp.blogspot.com/-DvPXWc9R_1w/Tk2-Y7CTfGI/AAAAAAAAAIw/MbzjFJfQ1OI/s400/asterix-fight.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Not everyone agrees with what Percona is doing. &amp;nbsp;However, you have to give Peter, Baron, and others at Percona credit for taking the risk to organize a replacement conference. &amp;nbsp;It's a big financial commitment to rent space in Santa Clara. &amp;nbsp;The conference will be a huge amount of work, much of it quite thankless. &amp;nbsp;Even so I imagine it might be tempting for some people to try to organize an anti-conference just to spite Percona, much as Percona did at the MySQL Conference in 2009. &amp;nbsp;This would be a mistake. &lt;br /&gt;&lt;br /&gt;Here is a better suggestion. &amp;nbsp;Why not put the energy into organizing a Fall 2012 MySQL Community Conference in Europe? &amp;nbsp; Make it somewhere pleasant like Barcelona or Verona and everybody will want to go. &amp;nbsp;Two solid conferences in different locations at opposite ends of the calendar would benefit the MySQL community at all levels and make it possible for more people to attend. &amp;nbsp;For those of you who want to grow the community, it's time to stand up and be counted. &amp;nbsp;Let's get another conference off the ground. &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-474854520384624313?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/474854520384624313/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=474854520384624313' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/474854520384624313'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/474854520384624313'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/08/so-wheres-fall-mysql-community.html' title='So Where&apos;s the Fall MySQL Community Conference?'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-DvPXWc9R_1w/Tk2-Y7CTfGI/AAAAAAAAAIw/MbzjFJfQ1OI/s72-c/asterix-fight.jpg' height='72' width='72'/><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-7311363261462410731</id><published>2011-08-13T01:18:00.000-07:00</published><updated>2011-08-31T23:13:27.941-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='Tungsten'/><title type='text'>Adding Parallel Replication to MySQL in a Hurry</title><content type='html'>A previous article on this blog described &lt;a href="http://scale-out-blog.blogspot.com/2011/07/introducing-tungsten-on-disk-queues-for.html"&gt;Tungsten parallel replication using on-disk queues&lt;/a&gt;. &amp;nbsp;On-disk queues are now more or less finished, and I just closed the &lt;a href="http://code.google.com/p/tungsten-replicator/issues/detail?id=82"&gt;covering issue&lt;/a&gt; for the feature. &amp;nbsp;The work is bug fixing and performance testing from here on out. &amp;nbsp;Speaking of performance, that looks fairly good. &amp;nbsp; A recent on-site test using production workloads showed 3.3X improvement over native MySQL replication while holding resources like memory down to much more reasonable levels than in-memory queues. &amp;nbsp;We have further optimizations on the way, so this should improve. &lt;br /&gt;&lt;br /&gt;Now that parallel replication is working a lot better, what is it good for? &amp;nbsp;Here is a good start: &amp;nbsp;assuming your workload &lt;a href="http://scale-out-blog.blogspot.com/2011/03/tuning-tungsten-parallel-replication.html"&gt;is suitable&amp;nbsp;for shard-based replication&lt;/a&gt;, Tungsten offers a nice replacement for MySQL slave replication that will give you immediate parallel replication for any MySQL server starting with version 5.0 and up. &amp;nbsp;We call this native slave takeover. &amp;nbsp;It looks like the following diagram. &lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-PtTU38wa4vc/TkYyiiBzGAI/AAAAAAAAAH8/6xnY0wke3i8/s1600/Tungsten-native-slave-takeover.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="236" src="http://4.bp.blogspot.com/-PtTU38wa4vc/TkYyiiBzGAI/AAAAAAAAAH8/6xnY0wke3i8/s400/Tungsten-native-slave-takeover.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;You can migrate MySQL slaves to Tungsten very easily thanks to some great work on the Tungsten installer by Jeff Mace that Giuseppe Maxia &lt;a href="http://datacharmer.blogspot.com/2011/08/usability-improvements-in-tungsten-204.html"&gt;also praised in a recent article.&lt;/a&gt;&amp;nbsp;&amp;nbsp;Seeing that Giuseppe is also the Continuent QA director, I guess this means it passed. &amp;nbsp;:) &amp;nbsp; Here is how to set up.&lt;br /&gt;&lt;br /&gt;First, configure MySQL master/slave replication. &amp;nbsp;Here are typical commands after you synchronize the slave with a backup from the master. &amp;nbsp;For these examples, we assume the slave is running on host mercury, and the master is on host saturn. &lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;mysql -utungsten -psecret -hmercury -e "CHANGE MASTER TO \&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;MASTER_HOST='saturn', \&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;MASTER_USER='repl', \&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;MASTER_PASSWORD='s3cr3t', \&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;MASTER_LOG_FILE='mysql-bin.000338', \&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;MASTER_LOG_POS=4534"&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;mysql -utungsten -psecret -hmercury -e "START SLAVE"&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;mysql -utungsten -psecret -hmercury -e "SHOW SLAVE STATUS\G"&lt;/span&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Next, download and unpack Tungsten from the &lt;a href="http://s3.amazonaws.com/files.continuent.com/builds/nightly/tungsten-2.0-snapshots/index.html"&gt;latest dev build&lt;/a&gt;. &amp;nbsp;Here are sample commands for build 231. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;wget --no-check-certificate https://s3.amazonaws.com/files.continuent.com/builds/nightly/tungsten-2.0-snapshots/tungsten-replicator-2.0.4-231.tar.gz&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;tar -xvzf tungsten-replicator-2.0.4-231.tar.gz&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;cd tungsten-replicator-2.0.4-231&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;/span&gt;Finally, run the installation using the following handy command. &amp;nbsp;This will copy the download into its proper location (here /opt/tungsten) and start Tungsten. &amp;nbsp;Tungsten will stop MySQL replication if it is running and start Tungsten replication from the exact point where native replication left off.&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;tools/tungsten-installer \&lt;/span&gt;&lt;br /&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--direct &amp;nbsp;\&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--native-slave-takeover \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--master-host=saturn &amp;nbsp;\&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--master-user=repl&amp;nbsp;&amp;nbsp;\&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--master-password=s3cr3t&amp;nbsp;&amp;nbsp;\&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--slave-host=mercury \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--slave-user=tungsten &amp;nbsp;\&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--slave-password=secret &amp;nbsp;\&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--service-name=takeover \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--home-directory=/opt/continuent \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--svc-parallelization-type=disk \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--channels=10 \&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--start-and-report&lt;/span&gt;&lt;/div&gt;&lt;div&gt;If there is a problem with your pre-requisites the installer will print out a message and stop. &amp;nbsp;Most of these messages should be pretty self-explanatory. &amp;nbsp; (In some builds there's a problem with &lt;a href="http://code.google.com/p/tungsten-replicator/issues/detail?id=182"&gt;OpenJDK failing the install check&lt;/a&gt;. &amp;nbsp;Use --no-validation to get around that.)&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here is another interesting tip. &amp;nbsp;If you try Tungsten and want to go back to MySQL replication, you can flip back to MySQL native replication with two commands. &amp;nbsp;Just take Tungsten offline cleanly, which shuts down all the replicator channels at the same point, and start MySQL replication again:&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;trepctl offline&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;mysql -urepl -ps3cr3t -e 'START SLAVE'&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;We are currently doing a lot of performance profiling and testing of parallel replication. &amp;nbsp;I hope to post more about performance results in a future article. &amp;nbsp;Meanwhile, there are 36 new features (including on-disk queues) and bug fixes that will roll into a final Tungsten 2.0.4 in the next week or so. &amp;nbsp; Try out the latest builds and see what you think.&amp;nbsp;&lt;/div&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-7311363261462410731?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/7311363261462410731/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=7311363261462410731' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/7311363261462410731'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/7311363261462410731'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/08/adding-parallel-replication-to-mysql-in.html' title='Adding Parallel Replication to MySQL in a Hurry'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-PtTU38wa4vc/TkYyiiBzGAI/AAAAAAAAAH8/6xnY0wke3i8/s72-c/Tungsten-native-slave-takeover.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-7220151185516599317</id><published>2011-07-24T09:35:00.000-07:00</published><updated>2011-07-24T09:36:16.741-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='PostgreSQL'/><title type='text'>Mobile Internet Access in Germany for Open Source Road Warriors</title><content type='html'>&lt;div&gt;Reliable Internet access is a long-standing problem for road warriors visiting foreign countries. &amp;nbsp;Open source developers in particular have problems reconciling travel with addiction to high-bandwidth network access from laptop computers. &amp;nbsp;Wi-Fi hotspots are scarce, costly, often slow, and in some cases complicated by inconvenient local laws like &lt;a href="http://blog.dlapiper.com/IPTitaly/entry/public_wi_fi_access_to1"&gt;Italy's Pisanu Decree&lt;/a&gt;. &amp;nbsp;International mobile network access plans are ridiculously expensive or like &lt;a href="http://www.droam.nl/en/faq/"&gt;DROAM&lt;/a&gt; have download limits that make them useless for serious programming. &lt;br /&gt;&lt;br /&gt;The best solution in many cases is to look for a local pre-paid mobile access plan in each country you visit. &amp;nbsp; Mobile networks are widely available and fast in developed regions, and there are cheap plans that limit the amount you pay while providing solid connectivity. &amp;nbsp;On a recent trip to Germany&amp;nbsp;I found a great solution for local Internet access from &lt;a href="http://www.fonic.de/"&gt;FONIC&lt;/a&gt;, which offers a prepaid data plan using their FONIC Surf-Stick. &amp;nbsp;The Surf-Stick is a USB modem that plugs into a USB port on your laptop and looks like the following.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-ZCCHmcKY6m4/TiwsSAe7shI/AAAAAAAAAHc/U8OSsWsC8xw/s1600/phonic-usb-modem.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="179" src="http://3.bp.blogspot.com/-ZCCHmcKY6m4/TiwsSAe7shI/AAAAAAAAAHc/U8OSsWsC8xw/s320/phonic-usb-modem.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;picture&gt;&lt;/picture&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;After you buy the Surf Stick, you can add money to it using credits purchased in local stores. &amp;nbsp;It's a relatively simple solution provided you understand a little about networking and can work your way through a bit of German.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The rest of this article is a description of how to use FONIC in Germany, as well as a review of the performance and a couple of downsides. &amp;nbsp;I&amp;nbsp;do not have any connection with FONIC other than using their products. &amp;nbsp;You may have different experiences or find something better. &amp;nbsp;If so, please write an article about it. &lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Getting Started&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;I bought got my Surf Stick for 49 Euro at &lt;a href="http://www.saturn.de/"&gt;Saturn&lt;/a&gt;, a big German consumer electronics chain. &amp;nbsp;You get the modem itself plus a SIM card and one day of free surfing. &amp;nbsp;You can get also the stick from other &lt;a href="http://www.fonic.de/html/filialfinder.html"&gt;FONIC partners&lt;/a&gt; or order it off the FONIC website. &amp;nbsp;Surprisingly it does not appear to be available in places like Frankfurt or Munich Airport. &lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The first step once you have the modem is to initialize the SIM on FONIC's site by registering yourself as a new user. &amp;nbsp;To do this you'll need to have your FONIC phone number and activation code ("Freischaltungsnummer"), which are written on the side of the envelope that contains the SIM and that you should try to avoid losing. &amp;nbsp;You'll also need a German address that you enter as part of the sign-up. &amp;nbsp;A hotel address is fine. &amp;nbsp;It takes a couple of hours after registration to complete provisioning. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Once provisioned, connecting is easy. &amp;nbsp;Put the SIM in the surf stick and plug the stick in a USB port on your computer. &amp;nbsp;On Mac OS X, you'll see an application called "Mobile Partner." &amp;nbsp;Here are the steps to activate a connection:&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;Click the Mobile Partner app. &amp;nbsp;&lt;/li&gt;&lt;li&gt;Enter your PIN (also on the SIM envelope) and press Return.&amp;nbsp;&lt;/li&gt;&lt;li&gt;The application will stall for a few seconds while it looks for the modem. &amp;nbsp;Once complete it will show a screen entitled "Verbinden" (connect). &amp;nbsp;&amp;nbsp;&lt;/li&gt;&lt;li&gt;Click Verbinden. &amp;nbsp;A couple of pop-ups will appear and you are online.&amp;nbsp;&lt;/li&gt;&lt;li&gt;When you want to drop the connection click "Trennen" (disconnect). &amp;nbsp;&lt;/li&gt;&lt;/ol&gt;You will now need to load some money into your account. &amp;nbsp;There's no way to top up online with a credit card--you either have to register a German bank account or purchase credits. &amp;nbsp;For foreigners the only practical thing is therefore to buy a credit, which is called an "Aufladebon." &amp;nbsp;You can get them at stores like Lidl, Rossman, and Jet as well as Esso gas stations. &amp;nbsp;Just ask for the following in German: &lt;i&gt;&amp;nbsp;FONIC Aufladebon für 20 Euro&lt;/i&gt;, which is a 20 Euro credit and gives you 8 days of surfing. &amp;nbsp; I recommend buying this at the same store where you get your modem, which means the start-up cost is about 70 Euros. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The Aufladebon itself a bit dissapointing: &amp;nbsp;a printed receipt with a 12 digit number and some instructions in German. &amp;nbsp;Start your Mobile Partner app, connect to the Internet, and click on "Guthaben verwalten" (manage credit). &amp;nbsp;Select "Guthabenkarte aufladen" (load credit), enter your 12 digit number, and press enter. &amp;nbsp; Your credit is now topped up for a while. &amp;nbsp;You can check the level using the "Guthabenabfrage" (check credit) button. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Performance&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I used the FONIC modem for about a week. &amp;nbsp;It functioned well in every location that had at all reasonable connectivity. &amp;nbsp;FONIC handles transitions across cells really well--provided there's connectivity it seems to use the fastest available protocol, such as &lt;a href="http://en.wikipedia.org/wiki/High_Speed_Packet_Access"&gt;HSPA&lt;/a&gt;, and switches seamlessly down to lower capacity protocols like &lt;a href="http://en.wikipedia.org/wiki/Enhanced_Data_Rates_for_GSM_Evolution"&gt;Edge&lt;/a&gt; if that is all that is available. &amp;nbsp;You can use it in a car or train without experiencing application problems--I used it on the Autobahn between Hamburg and Lübeck and it worked surprisingly well. &amp;nbsp; (Just be clear, somebody else was driving at the time.)&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;With good connectivity, the rated speed is 7.2Mbps down and 5.76 upload, which is comparable to average household connections. &amp;nbsp;PDFs and similar files downloaded at 150K/sec. &amp;nbsp;Upload was similarly snappy. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Applications like ssh and even remote debugging/profiling worked very well. &amp;nbsp;In my case this included running&amp;nbsp;Yourkit Java profiler connecting into an application running on servers elsewhere in Germany. &amp;nbsp;I also hosted GotoMeeting web conferences. &amp;nbsp;The latency seems higher than DSL--typical latencies run about 100 to 200ms, which is about the same latency you get when connecting from the US West Coast to sites in Europe.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Daily transfer limits are very competitive. &amp;nbsp;The current limit is 500Mb per day at full speed access, though in practice it seemed to cut off around 600Mb for me. &amp;nbsp;This is better than offerings from the big guys like T-Mobile and Vodafone. &amp;nbsp;FONIC implements the limit quite elegantly. &amp;nbsp;First you get a nice SMS that you are near the limit, followed by another that you are over the limit. &amp;nbsp;However, after that you do not get cut off. &amp;nbsp;The connection just switches to 64kbps until the next day. &amp;nbsp;This is still faster than the Wi-Fi connectivity in many hotel rooms. &lt;br /&gt;&lt;br /&gt;If you don't want to wait for the friendly SMS message about limits, you can check your limits easily on using the Mobile Partner app. &lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-6ZFemMVqn2I/Tiw2eZ5y21I/AAAAAAAAAHg/CTmuh3h6wmk/s1600/fonic-app.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="253" src="http://2.bp.blogspot.com/-6ZFemMVqn2I/Tiw2eZ5y21I/AAAAAAAAAHg/CTmuh3h6wmk/s400/fonic-app.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Speaking of the app, Mobile Partner seems really solid. &amp;nbsp;There were no bugs, crashes or weird hangs the way you get with some of the US providers like Verizon's prepaid Internet app. I only ran it on Mac OS X but I would guess it is just as good on Windows. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Caveats&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In general FONIC is outstanding. &amp;nbsp; Life is not perfect, so here are a few of the limitations I found. &amp;nbsp;&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;German. &amp;nbsp;FONIC is 100% German, including the website and the app. &amp;nbsp;You'll need to know some German or make friends fast. &amp;nbsp;Also, the data plan apparently only works in Germany. &amp;nbsp;&lt;/li&gt;&lt;li&gt;Phone connectivity. &amp;nbsp;If your cell phone does not work, FONIC won't either. &amp;nbsp;The nice thing is that you can have connectivity drop out for a bit while crossing cells in a train or car without losing TCP/IP sessions. &amp;nbsp;&lt;/li&gt;&lt;li&gt;Occasional routing problems.&amp;nbsp;On one evening in one particular location the FONIC modem did not work at all well, reporting ping latencies of 50+ seconds or dropping out entirely. &amp;nbsp;It was not apparent whether this was due to poor cell coverage or another issue. &amp;nbsp;It worked fine everywhere else.&lt;/li&gt;&lt;/ol&gt;There appear to be some competitors to FONIC but I have no idea how well they work. &amp;nbsp;One of the advantages of FONIC is that they have many partners in Germany that sell both modems as well as the credits. &amp;nbsp;That would also be a consideration in evaluating competing products.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Summary&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;Some day it may be possible to get decent mobile data plans that work for international travel. &amp;nbsp;Given the complexity of telecom regulation and the motivations of local carriers to prevent competition, you probably don't want to hold your breath until they arrive.&lt;br /&gt;&lt;br /&gt;In the meantime, FONIC looks like a good Internet access solution for anybody traveling to Germany for more than a few days who needs constant, high-capacity access. &amp;nbsp;I was stuck in a couple of locations that had no network access, so it was a life-saver in those cases. &amp;nbsp;However, FONIC is also cheaper than hotel Wi-Fi and has better performance than many Wi-Fi hotspots. &amp;nbsp;&amp;nbsp;If you move about or uncertain about the quality of the Internet connectivity where you are staying, I highly recommend it. &lt;br /&gt;&lt;br /&gt;If you know of similar plans for other countries, please share your experiences. &amp;nbsp;I would love to find something like FONIC for Italy, Spain, and the US. &amp;nbsp;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-7220151185516599317?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/7220151185516599317/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=7220151185516599317' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/7220151185516599317'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/7220151185516599317'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/07/mobile-internet-access-in-germany-for.html' title='Mobile Internet Access in Germany for Open Source Road Warriors'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-ZCCHmcKY6m4/TiwsSAe7shI/AAAAAAAAAHc/U8OSsWsC8xw/s72-c/phonic-usb-modem.jpg' height='72' width='72'/><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-5335306551034095533</id><published>2011-07-03T13:45:00.000-07:00</published><updated>2011-07-03T13:45:16.777-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='Tungsten'/><title type='text'>Introducing Tungsten On-Disk Queues for Parallel Replication</title><content type='html'>&lt;a href="http://code.google.com/p/tungsten-replicator"&gt;Tungsten Replicator&lt;/a&gt; has offered shard-based parallel replication to slaves &lt;a href="http://scale-out-blog.blogspot.com/2010/10/parallel-replication-on-mysql-report.html"&gt;since late 2010&lt;/a&gt;. &amp;nbsp;The initial implementation uses in-memory queues. &amp;nbsp;Working purely in memory keeps latency low and throughput high. &amp;nbsp; On the other hand, working in memory consumes valuable RAM. &amp;nbsp;It also forces us to buffer all in-flight transactions and therefore greatly limits the span of time permissible between the slowest and fastest shard. &lt;br /&gt;&lt;br /&gt;Hence our newest improvement: &amp;nbsp;on-disk parallel queues. &amp;nbsp;In this article I will cover how parallel replication works in general, how on-disk queues help with parallel replication, and then show how to set up from the latest builds. &lt;br /&gt;&lt;br /&gt;First, let's review the basic mechanics. &amp;nbsp;Parallel replication is really "parallel apply," which means taking a stream of serialized transactions, splitting them into separate streams, and applying them in parallel to a slave DBMS. &amp;nbsp;In the Tungsten pipeline architecture, we implement this kind of flow using a combination of stages and stores. &amp;nbsp;One stage reads transactions from the persistent transaction history log (aka THL) to a "parallel store." &amp;nbsp;The parallel store splits the stream into a set of queues. &amp;nbsp;The next stage extracts from those queues and applies to the slave. &amp;nbsp; It looks like the following picture:&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-vN8hToV7X2c/ThC2LFSYKEI/AAAAAAAAAHU/o95YzjSI98Q/s1600/Tungsten-Parallel-Queue-Model.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="210" src="http://4.bp.blogspot.com/-vN8hToV7X2c/ThC2LFSYKEI/AAAAAAAAAHU/o95YzjSI98Q/s400/Tungsten-Parallel-Queue-Model.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;From a conceptual point of view the incoming thl-to-q task thread performs an indexing function. &amp;nbsp;It &amp;nbsp;guides construction of the queues read by task threads in the q-to-dbms stage. &amp;nbsp;Within this framework there are many ways to feed events into the parallel queues. &amp;nbsp; &amp;nbsp;In the case of on-disk queues there are two obvious design options. &amp;nbsp; &lt;br /&gt;&lt;ol&gt;&lt;li&gt;Read data out of the THL and split them into separate transaction logs per parallel queue. &amp;nbsp;This is very similar to the in-memory approach, except that the queues are now on disk (or SSD or whatever storage you pick). &amp;nbsp;It can be implemented without adding any extra threads to the parallel store.&amp;nbsp;&lt;/li&gt;&lt;li&gt;Leave all data in THL. &amp;nbsp;Add a cursor for each parallel queue that scans the THL and picks only the transactions that belong in that parallel queue. &amp;nbsp; This requires extra threads to do the scans, hence is more complex to implement. &amp;nbsp;&lt;/li&gt;&lt;/ol&gt;Both approaches achieve the primary goal, which is to keep the transactions in storage until we actually need them and thereby minimize memory usage. &amp;nbsp;This in turn solves a major problem, namely that individual shards can now be many thousands or even millions of transactions apart in the serial history. &amp;nbsp;Beyond that, it is not completely obvious which approach is better. &lt;br /&gt;&lt;br /&gt;For example, option 1 isolates the reads to individual files. &amp;nbsp;This minimizes overall I/O at the cost of making it more random, since reads and writes are spread over many files. &amp;nbsp;Option 2 avoids extra writes and keeps I/O sequential, but introduces a bunch of threads doing the equivalent of table scans across the same patch of storage. &amp;nbsp;Up to a point we can assume that pages are coming out of the OS page cache rather than storage but this assumption will not hold for all operating environments and workloads. &amp;nbsp;The only way to prove the trade-offs is to implement and test. &amp;nbsp;(We may end up implementing both.)&lt;br /&gt;&lt;br /&gt;After some discussion internally at Continuent as well as with Harrison Fisk from Facebook, we picked option 2 for the initial implementation. &amp;nbsp;Here is a diagram that shows how it works. &lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-3hiCpt_68Pc/ThC8ZxDLBEI/AAAAAAAAAHY/J5QFa3UUPVM/s1600/Tungsten-THLParallelQueue.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="248" src="http://3.bp.blogspot.com/-3hiCpt_68Pc/ThC8ZxDLBEI/AAAAAAAAAHY/J5QFa3UUPVM/s400/Tungsten-THLParallelQueue.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Here is a quick tour of the implementation in Java class &lt;a href="http://code.google.com/p/tungsten-replicator/source/browse/trunk/replicator/src/java/com/continuent/tungsten/replicator/thl/THLParallelQueue.java"&gt;THLParallelQueue&lt;/a&gt;. &amp;nbsp;This class maintains an in-memory blocking queue for each channel. &amp;nbsp;Each queue has a corresponding read thread that scans the THL and places matching events into the queue. &amp;nbsp;The THLParallelQueue class synchronizes read threads and handles issues like serialization and clean shutdown. &amp;nbsp;Some memory is therefore consumed, for queues, but they are quite small and amount to far less than keeping all transactions in memory.&lt;br /&gt;&lt;br /&gt;So much for the theoretical description. If you would like to test on-disk queues yourself, you can get started in three steps. &lt;br /&gt;&lt;ol&gt;&lt;li&gt;Download the &lt;a href="http://s3.amazonaws.com/files.continuent.com/builds/nightly/tungsten-2.0-snapshots/index.html"&gt;latest nightly build&lt;/a&gt; of Tungsten Replicator to /tmp.&amp;nbsp;&lt;/li&gt;&lt;li&gt;Untar and cd into the resulting release directory. &amp;nbsp;&lt;/li&gt;&lt;li&gt;Install using the new tungsten-installer &lt;a href="http://datacharmer.blogspot.com/2011/06/getting-started-with-tungsten.html"&gt;as described by Guiseppe Maxia&lt;/a&gt;. &amp;nbsp;&lt;/li&gt;&lt;/ol&gt;Here is an example of set-up commands. &amp;nbsp;My test system uses logos1 as the master and logos2 as the slave. &amp;nbsp;The MySQL DBMS build is Percona Server 5.1.54. &amp;nbsp;Tungsten is installed in /opt/tungsten. &lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;# Download and unpack build.&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;cd /tmp&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;wget --no-check-certificate https://s3.amazonaws.com/files.continuent.com/builds/nightly/tungsten-2.0-snapshots/tungsten-replicator-2.0.4-154.tar.gz&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;# Untar.&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;tar -xvzf tungsten-replicator-2.0.4-154.tar.gz&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;cd tungsten-replicator-2.0.4-154&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;# Set up and start replicators.&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;export TUNGSTEN_HOME=/opt/tungsten&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;/tmp/tungsten-replicator-2.0.4-154/tools/tungsten-installer \&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--master-slave &amp;nbsp;\&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--master-host=logos1 &amp;nbsp;\&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--datasource-user=tungsten &amp;nbsp;\&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--datasource-password=secret &amp;nbsp;\&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--service-name=percona \&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--home-directory=${TUNGSTEN_HOME} \&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--cluster-hosts=logos1,logos2 \&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--relay-directory=${TUNGSTEN_HOME}/relay-logs \&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--datasource-log-directory=/usr/local/percona-5.1.54/data \&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--thl-directory=${TUNGSTEN_HOME}/thl-logs \&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;b&gt;&amp;nbsp;&amp;nbsp;--channels=10 \&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;b&gt;&amp;nbsp;&amp;nbsp;--svc-parallelization-type=disk \&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;--start-and-report&lt;/span&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;br /&gt;&lt;br /&gt;Note the bold options to select disk queues--"memory" is the other option--and the number of channels. &amp;nbsp;You can confirm you have the right queue installed by running the following command against any slave replicator. &amp;nbsp;You should see the storage class THLParallelQueue in the status output. &lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;$ trepctl -host logos2 status -name stores&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;Processing status command (stores)...&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;NAME &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;VALUE&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;---- &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;-----&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;doChecksum &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;: false&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;logDir &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;: /opt/rhodges4/thl-logs/percona&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;logFileRetention &amp;nbsp;: 7d&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;logFileSize &amp;nbsp; &amp;nbsp; &amp;nbsp; : 100000000&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;maximumStoredSeqNo: 0&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;minimumStoredSeqNo: 0&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;name &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;: thl&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;storeClass &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;: com.continuent.tungsten.replicator.thl.THL&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;NAME &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;VALUE&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;---- &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;-----&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;criticalPartition : -1&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;discardCount &amp;nbsp; &amp;nbsp; &amp;nbsp;: 0&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;eventCount &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;: 1&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;headSeqno &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; : 0&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;maxSize &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; : 10&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;name &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;: parallel-queue&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;queues &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;: 10&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;serializationCount: 0&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;serialized &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;: false&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;stopRequested &amp;nbsp; &amp;nbsp; : false&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;store.0 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; : THLParallelReadTask task_id=0 thread_name=store-thl-0 hi_seqno=0 lo_seqno=0 read=1 discarded=1 events=0&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;store.1 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; : THLParallelReadTask task_id=1 thread_name=store-thl-1 hi_seqno=0 lo_seqno=0 read=1 discarded=1 events=0&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;store.2 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; : THLParallelReadTask task_id=2 thread_name=store-thl-2 hi_seqno=0 lo_seqno=0 read=1 discarded=1 events=0&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;store.3 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; : THLParallelReadTask task_id=3 thread_name=store-thl-3 hi_seqno=0 lo_seqno=0 read=1 discarded=1 events=0&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;store.4 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; : THLParallelReadTask task_id=4 thread_name=store-thl-4 hi_seqno=0 lo_seqno=0 read=1 discarded=1 events=0&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;store.5 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; : THLParallelReadTask task_id=5 thread_name=store-thl-5 hi_seqno=0 lo_seqno=0 read=1 discarded=1 events=0&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;store.6 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; : THLParallelReadTask task_id=6 thread_name=store-thl-6 hi_seqno=0 lo_seqno=0 read=1 discarded=1 events=0&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;store.7 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; : THLParallelReadTask task_id=7 thread_name=store-thl-7 hi_seqno=0 lo_seqno=0 read=1 discarded=1 events=0&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;store.8 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; : THLParallelReadTask task_id=8 thread_name=store-thl-8 hi_seqno=0 lo_seqno=0 read=1 discarded=1 events=0&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;store.9 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; : THLParallelReadTask task_id=9 thread_name=store-thl-9 hi_seqno=0 lo_seqno=0 read=1 discarded=0 events=0&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;storeClass &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;: com.continuent.tungsten.replicator.thl.THLParallelQueue&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;syncEnabled &amp;nbsp; &amp;nbsp; &amp;nbsp; : true&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;syncInterval &amp;nbsp; &amp;nbsp; &amp;nbsp;: 2000&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;Finished status command (stores)...&lt;/span&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;Now for some fine print. &amp;nbsp;On-disk queues are implemented but are still undergoing QA. &amp;nbsp;There are bugs. &amp;nbsp;The most important problem is performance--the latency is a lot higher than expected on some of our systems, which I suspect is due to an as-yet undiagnosed bug. &amp;nbsp;If you try them out now you can expect to hit a few problems. &amp;nbsp;On the other hand, we take any and all feedback quite seriously, so this is your chance provide input and help guide the final implementation. &amp;nbsp;Please log issues on the &lt;a href="http://code.google.com/p/tungsten-replicator/issues/list"&gt;Tungsten Replicator issue tracker&lt;/a&gt;&amp;nbsp;or bring up questions on the &lt;a href="http://groups.google.com/group/tungsten-replicator-discuss"&gt;tungsten-discuss mailing list&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Finally, if you would like to learn more about the parallel queue implementation, check out the &lt;a href="http://code.google.com/p/tungsten-replicator/wiki/Parallel_Queue_Architecture"&gt;design documentation on our wiki&lt;/a&gt;&amp;nbsp;as well as the source code. &amp;nbsp;They are both pretty readable.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-5335306551034095533?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/5335306551034095533/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=5335306551034095533' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/5335306551034095533'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/5335306551034095533'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/07/introducing-tungsten-on-disk-queues-for.html' title='Introducing Tungsten On-Disk Queues for Parallel Replication'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-vN8hToV7X2c/ThC2LFSYKEI/AAAAAAAAAHU/o95YzjSI98Q/s72-c/Tungsten-Parallel-Queue-Model.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-3819509900512957218</id><published>2011-05-14T16:49:00.000-07:00</published><updated>2011-05-14T16:54:15.612-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='NoSQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Tungsten'/><title type='text'>Introducing MySQL to MongoDB Replication</title><content type='html'>The last article on this blog described our planned &lt;a href="http://scale-out-blog.blogspot.com/2011/05/time-for-sqlnosql-group-hug.html"&gt;MySQL to MongoDB replication hackathon&lt;/a&gt; at the recent&amp;nbsp;&lt;a href="http://opensqlcamp.org/Events/Sardinia2011"&gt;Open DB Camp in Sardinia&lt;/a&gt;. &amp;nbsp;Well, it worked, and the code is now checked into the &lt;a href="http://code.google.com/p/tungsten-replicator/"&gt;Tungsten Replicator project&lt;/a&gt;. &amp;nbsp; This article describes exactly what we did to write the code and set up replication. &amp;nbsp;You can view it as a kind of cookbook both for implementing new database types in Tungsten as well as setting up replication to MongoDB.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Team&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;MySQL to MongoDB replication was a group effort with three people: &amp;nbsp;&lt;a href="http://blog.flaper87.org/"&gt;Flavio Percoco&lt;/a&gt;, Stephane Giron, and me. &amp;nbsp;Flavio has worked on MongoDB for a couple of years and is extremely well-informed both on database setup as well as application design. &amp;nbsp;Stephane Giron is a replication engineer at Continuent and has done a substantial amount of the work on data extraction from MySQL, especially row replication. &amp;nbsp;I work on the core execution framework as well as performance.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Getting Started with MongoDB&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;There were a couple of talks on MongoDB during the first morning of Open DB camp (Saturday May 7th), which Stephane and I dutifully attended to raise our consciousness. &amp;nbsp;We got cracking on implementation around 2pm that afternoon. &amp;nbsp; The first step was to bring up MongoDB 1.8.1 and study its habits with help from Flavio.&lt;br /&gt;&lt;br /&gt;MongoDB is definitely easy to set up. &amp;nbsp;You get binary builds from the &lt;a href="http://www.mongodb.org/downloads"&gt;MongoDB download page&lt;/a&gt;. &amp;nbsp;Here is a minimal set of commands to unpack MongoDB 1.8.1 and start the mongod using directory &lt;b&gt;data&lt;/b&gt; to hold tables. &amp;nbsp; &lt;br /&gt;&lt;br /&gt;&lt;code&gt;$ tar -xvzf mongodb-osx-x86_64-1.8.1.tgz&lt;br /&gt;$ cd mongodb-osx-x86_64-1.8.1&lt;br /&gt;$ mkdir data&lt;br /&gt;$ bin/mongo --dbpath data&lt;br /&gt;(... messages ...)&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;You connect to mongod using the mongo client. &amp;nbsp;Here's an example of connecting and creating a table with a single row. &lt;br /&gt;&lt;br /&gt;&lt;code&gt;$ bin/mongo localhost:27017&lt;br /&gt;MongoDB shell version: 1.8.1&lt;br /&gt;connecting to: localhost:27017/test&lt;br /&gt;&amp;gt; use mydb&lt;br /&gt;switched to db mydb&lt;br /&gt;&amp;gt; db.test.insert({"test": "test value", "anumber" : 5 }) &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;br /&gt;&amp;gt; db.test.find()&lt;br /&gt;{ "_id" : ObjectId("4dce9a4f3d6e186ffccdd4bb"), "test" : "test value", "anumber" : 5 }&lt;br /&gt;&amp;gt; exit&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;This is schema-less programming in action. &amp;nbsp;You just insert BSON documents (BSON = &amp;nbsp;Binary JSON) into &lt;i&gt;collections&lt;/i&gt;, which is Mongolese for &lt;i&gt;tables&lt;/i&gt;. &amp;nbsp;MongoDB creates the collection for you as soon as you put something in it. The automatic materialization is quite addictive once you get used to it, which takes about 5 minutes. &lt;br /&gt;&lt;br /&gt;The MongoDB client language is really handy. &amp;nbsp;It is based on JavaScript. &amp;nbsp;There are what seem to be some non-Javascript commands like "show dbs" to show databases or "show collections" to list the tables. &amp;nbsp;Everything else is object-oriented and easy to understand. &amp;nbsp;For example, to find all the records in collection test, as we saw above, you just connect to the database and issue a command on the local db object. &amp;nbsp;Collections appear as properties of db, and operations on the collection are methods. &lt;br /&gt;&lt;br /&gt;It helps that the MongoDB folks provide very accessible documentation, for example a &lt;a href="http://www.mongodb.org/display/DOCS/SQL+to+Mongo+Mapping+Chart"&gt;SQL to MongoDB translation chart&lt;/a&gt;. &amp;nbsp; I put together a little practice program using the &lt;a href="http://www.mongodb.org/display/DOCS/Java+Language+Center"&gt;MongoDB Java driver&lt;/a&gt; to insert, referring to the&lt;a href="http://api.mongodb.org/java/2.5.3/"&gt; Javadoc for the class library&lt;/a&gt; when in doubt about API calls. &amp;nbsp;There are also a couple of very helpful examples, &lt;a href="https://github.com/mongodb/mongo-java-driver/blob/master/examples/QuickTour.java"&gt;like this one&lt;/a&gt;, included with the driver.&lt;br /&gt;&lt;br /&gt;All told, setup and orientation took us about 45 minutes. &amp;nbsp;It helped enormously that Flavio is a MongoDB expert, which minimized flail considerably. &lt;br /&gt;&lt;br /&gt;&lt;b&gt;Implementing Basic Replication from MySQL to MongoDB&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;After setup we proceeded to implement replication. &amp;nbsp;Here is an overview of the replicator pipeline to move data from MySQL to MongoDB. &amp;nbsp;Pipelines are message processing flows within the replicator.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-tGOIq1DiuR8/Tc6gy44k00I/AAAAAAAAAHI/jHLNF2huoBw/s1600/MySQL-to-MongoDB-Pipeline.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="307" src="http://3.bp.blogspot.com/-tGOIq1DiuR8/Tc6gy44k00I/AAAAAAAAAHI/jHLNF2huoBw/s640/MySQL-to-MongoDB-Pipeline.jpg" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Direct pipelines move data from DBMS to another within a single replicator. &amp;nbsp;They are already a standard part of Tungsten Replicator and most of the code shown above already exists, except for the parts shown in red. &amp;nbsp; Before we started, we therefore needed to set up a replicator with a direct pipeline. &lt;br /&gt;&lt;br /&gt;We first built the code according to the &lt;a href="http://code.google.com/p/tungsten-replicator/wiki/HowToBuild"&gt;instructions on the Tungsten project wiki&lt;/a&gt;, uploaded the binary to our test host, and configured the replicator. &amp;nbsp; First, we&amp;nbsp;ran the Tungsten &lt;b&gt;configure&lt;/b&gt; script to set defaults for the MySQL server (user name, extract method, etc.). &amp;nbsp; Next we ran the configure-service command to set up the direct pipeline configuration file. &amp;nbsp; Both commands together look like the following:&lt;br /&gt;&lt;code&gt;&lt;br /&gt;./configure&lt;br /&gt;./configure-service&amp;nbsp;-C --role=direct mongodb&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;The second command created a file called tungsten-replicator/conf/static-mongodb.properties with all the information about the direct pipeline implementation but of course nothing yet about MongoDB.&lt;br /&gt;&lt;br /&gt;Now we could start the implementation. &amp;nbsp;&amp;nbsp;To move data to MongoDB, we needed two new components:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;A Tungsten&amp;nbsp;&lt;a href="http://code.google.com/p/tungsten-replicator/source/browse/trunk/replicator/src/java/com/continuent/tungsten/replicator/applier/RawApplier.java"&gt;RawApplier&lt;/a&gt;&amp;nbsp;to apply row updates to MongoDB. &amp;nbsp;RawApplier is the basic interface you implement to create an applier to a database. &amp;nbsp;&lt;/li&gt;&lt;li&gt;A Tungsten &lt;a href="http://code.google.com/p/tungsten-replicator/source/browse/trunk/replicator/src/java/com/continuent/tungsten/replicator/filter/Filter.java"&gt;Filter&lt;/a&gt; to stick column names on row updates after extracting from MySQL. &amp;nbsp;MySQL row replication does not do this automatically, which makes it difficult to construct JSON at the other end because you do not have the right property names.&amp;nbsp;&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;To get started on the applier I implemented&amp;nbsp;a very simple class named &lt;a href="http://code.google.com/p/tungsten-replicator/source/browse/trunk/replicator/src/java/com/continuent/tungsten/replicator/applier/MongoApplier.java"&gt;MongoApplier&lt;/a&gt; that could take an insert from MySQL, turn it into a BSON document, and add it to an equivalently named database and collection in MongoDB. &amp;nbsp;I added this to the replicator code tree, then built and uploaded tungsten-replicator.jar. &amp;nbsp;(Use 'ant dist' in the replicator directory to build the JAR.)&lt;br /&gt;&lt;br /&gt;To start using the new MongoDB applier, we needed to edit the service properties file to use this component instead of the standard MySQL applier that configuration adds by default. &amp;nbsp;To do this, you can open up static-mongodb.properties with your favorite editor. &amp;nbsp;Add the following properties at the bottom of the APPLIERS section.&lt;br /&gt;&lt;code&gt;&lt;br /&gt;# MongoDB applier. &amp;nbsp;You must specify a connection string for the server. &lt;br /&gt;# This currently supports only a single server. &lt;br /&gt;replicator.applier.mongodb=com.continuent.tungsten.replicator.applier.MongoApplier&lt;br /&gt;replicator.applier.mongodb.connectString=localhost:27017&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;Next, you need to fix up the direct pipeline so that the last stage uses the new applier. &amp;nbsp;We located the direct pipeline definition (around line 208 in the properties file) and set the applier to mongodb as shown in the following example.&lt;br /&gt;&lt;code&gt;&lt;br /&gt;# Write from parallel queue to database.&lt;br /&gt;replicator.stage.d-pq-to-dbms=com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask&lt;br /&gt;replicator.stage.d-pq-to-dbms.extractor=parallel-q-extractor&lt;br /&gt;&lt;b&gt;replicator.stage.d-pq-to-dbms.applier=mongodb&lt;/b&gt;&lt;br /&gt;replicator.stage.d-pq-to-dbms.filters=mysqlsessions&lt;br /&gt;replicator.stage.d-pq-to-dbms.taskCount=${replicator.global.apply.channels}&lt;br /&gt;replicator.stage.d-pq-to-dbms.blockCommitRowCount=${replicator.global.buffer.size}&lt;/code&gt;&lt;/div&gt;&lt;br /&gt;We then started the replicator using 'replicator start.' &amp;nbsp;At that point we could do the following on MySQL:&lt;br /&gt;&lt;code&gt;&lt;br /&gt;mysql&amp;gt; create table foo(id int primary key, msg varchar(35));&lt;br /&gt;Query OK, 0 rows affected (0.05 sec)&lt;br /&gt;mysql&amp;gt; insert into foo values(1, 'hello from MySQL!');&lt;br /&gt;Query OK, 1 row affected (0.00 sec)&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;...And within a second we could see the following over in MongoDB:&lt;br /&gt;&lt;code&gt;&lt;br /&gt;&amp;gt; show collections&lt;br /&gt;foo&lt;br /&gt;system.indexes&lt;br /&gt;&amp;gt; db.foo.find();&lt;br /&gt;{ "_id" : ObjectId("4dc55e45ad90a25b9b57909d"), "1" : "1", "2" : "hello from MySQL!" }&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;This kind of progress was very encouraging. &amp;nbsp;It took roughly 2 hours to get to move the first inserts across. &amp;nbsp;Compared to replicating to a new SQL database like Oracle that's lightning fast. &amp;nbsp;However, there were still no property names because we were not adding column names to row updates.&lt;br /&gt;&lt;br /&gt;Meanwhile, Stephane had finished the column name filter (&lt;a href="http://code.google.com/p/tungsten-replicator/source/browse/trunk/replicator/src/java/com/continuent/tungsten/replicator/filter/ColumnNameFilter.java"&gt;ColumnNameFilter&lt;/a&gt;) and checked it in. &amp;nbsp;I rebuilt and refreshed the replicator code, then edited static-mongodb.properties as follows to add the filter. &amp;nbsp;First put in the filter definition in the FILTERS section:&lt;br /&gt;&lt;code&gt;&lt;br /&gt;# Column name filter. &amp;nbsp;Adds column name metadata to row updates. &amp;nbsp;This is&lt;br /&gt;# required for MySQL row replication if you have logic that requires column&lt;br /&gt;# names.&lt;br /&gt;replicator.filter.colnames=com.continuent.tungsten.replicator.filter.ColumnNameFilter&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;Next, make the first stage of the direct pipeline use the filter:&lt;br /&gt;&lt;code&gt;&lt;br /&gt;# Extract from binlog into queue.&lt;br /&gt;replicator.stage.d-binlog-to-q=com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask&lt;br /&gt;replicator.stage.d-binlog-to-q.extractor=mysql&lt;br /&gt;&lt;b&gt;replicator.stage.d-binlog-to-q.filters=colnames,pkey&lt;/b&gt;&lt;br /&gt;replicator.stage.d-binlog-to-q.applier=queue&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;We then restarted the replicator. &amp;nbsp;Thereupon, we started to see inserts like the following, complete with property names:&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&amp;gt; db.foo.find()&lt;br /&gt;{ "_id" : ObjectId("4dc77bacad9092bd1aef046d"), "id" : "25", "data" : "data value" }&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;That was better, much better! &amp;nbsp; To this point we had put in exactly 2 hours and 45 minutes wall clock time. &amp;nbsp;It was enough to prove the point and more than enough for a demo the next day. &amp;nbsp; The hackathon was a rousing success. &lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;b&gt;Further Development&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Over the next couple of days I rounded out the MongoApplier to add support for UPDATE and DELETE operations, as well as to implement restart. &amp;nbsp;The full implementation is now checked in on code.google.com, so you can repeat our experiences by downloading code and building yourself or by grabbing one of the Tungsten nightly builds. &lt;br /&gt;&lt;br /&gt;Restart is an interesting topic. &amp;nbsp;Tungsten uses a table to store the sequence number of the last transaction it applied. &amp;nbsp;We do this by creating an equivalent collection in MongoDB, which is updated after each commit. &amp;nbsp;There is a problem in that MongoDB does not have transactions. &amp;nbsp;Each update is effectively auto-commit, much like MyISAM table type on MySQL. &amp;nbsp;This means that while Tungsten can restart properly after a clean shutdown, slave replication is not crash safe. &amp;nbsp;Lack of atomic transactions is&amp;nbsp;a bigger issue with MongoDB and other NoSQL databases that goes far beyond replication. &amp;nbsp;For now, this is just how Tungsten's MongoDB support works.&lt;br /&gt;&lt;br /&gt;Speaking of things that don't work, the current implementation is a prototype only. &amp;nbsp;We have not tested it with more than a few data types. &amp;nbsp;It only works with a single MongoDB daemon. &amp;nbsp;It does not set keys properly or specify indexes on tables. &amp;nbsp;There are no guarantees about performance, except to say that if you had more than a small amount of data it would be quite slow. &amp;nbsp;(OK, that's a guarantee after all.) &lt;br /&gt;&lt;br /&gt;&lt;b&gt;Epilog&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Overall all the hackathon was a great success, not to mention lots of fun. &amp;nbsp;It went especially well because we had a relatively small problem and three people (Stephane, Flavio, and Robert) with complementary skills that we could combine easily for a quick solution. &amp;nbsp;That seems to be a recipe for succeeding on future hackathons. &lt;br /&gt;&lt;br /&gt;From a technical point of view, it helped that MongoDB is schema-less. &amp;nbsp;Unlike SQL databases, just adding a document materializes the table in MongoDB. &amp;nbsp;This made our applier implementation almost trivially easy, because processing row updates takes only a few dozen lines of Java code in total. &amp;nbsp;It also explains why a lot of people are quite attached to the NoSQL programming model. &lt;br /&gt;&lt;br /&gt;I am looking forward to learning a lot more about MongoDB and other NoSQL databases. &amp;nbsp;It would take two or three weeks of work to get our prototype to work with real applications. &amp;nbsp;Also, it looks as if we can implement replication going from MongoDB to MySQL. &amp;nbsp;According to Flavio there is a way to search the transaction log of MongoDB as a regular collection. &amp;nbsp;By appropriately transforming BSON objects back to SQL tuples, we can offer replication back to MySQL. &lt;br /&gt;&lt;br /&gt;There are many other lessons about MongoDB and NoSQL in general but it seems best to leave them for a future article when I have more experience and actually know what I'm talking about. &amp;nbsp;Meanwhile, you are welcome to try out our newest Tungsten replication feature.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-3819509900512957218?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/3819509900512957218/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=3819509900512957218' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/3819509900512957218'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/3819509900512957218'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/05/introducing-mysql-to-mongodb.html' title='Introducing MySQL to MongoDB Replication'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-tGOIq1DiuR8/Tc6gy44k00I/AAAAAAAAAHI/jHLNF2huoBw/s72-c/MySQL-to-MongoDB-Pipeline.jpg' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-2444798738935647056</id><published>2011-05-03T07:13:00.000-07:00</published><updated>2011-05-03T07:13:58.446-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='NoSQL'/><title type='text'>Time for a SQL/NoSQL Group Hug</title><content type='html'>&lt;div&gt;&lt;a href="http://opensqlcamp.org/Events/Sardinia2011"&gt;European Open Database Camp 2011&lt;/a&gt; is this weekend in the hills above Cagliari, Sardinia. &amp;nbsp;In honor of the increasing number sites that use both NoSQL and SQL databases, I am going to be running a &lt;a href="http://opensqlcamp.org/Events/Sardinia2011/Sessions#SQL_to_NoSQL_Replication_Hackathon_-_Robert_Hodges"&gt;MySQL to NoSQL Hackathon&lt;/a&gt; to prototype Tungsten Replicator support for transferring data from MySQL to MongoDB. &amp;nbsp;The conference will have &lt;a href="http://opensqlcamp.org/Events/Sardinia2011/Sessions#Mongodb.2C_The_When.2C_Why_and_What_-_Flavio_.5BFlaPer87.5D_Percoco_Premoli"&gt;at least one well-informed MongoDB expert&lt;/a&gt;, so we should have enough critical mass to get this done. &amp;nbsp;It helps that I'll be completely jet-lagged after flying in from the US and unable to sleep anyway. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;Over the past year SQL vs. NoSQL rants have started to abate as people get down to the practical work to make both types of systems work most effectively, often within the same site if not the same application. &amp;nbsp; I have spoken to a small but growing number of users who want to move data between MySQL and Cassandra, HBase, MongoDB, and others. &amp;nbsp;(Here's a &lt;a href="http://stackoverflow.com/questions/3801171/how-do-i-set-up-replication-from-mysql-to-mongodb"&gt;typical example from StackOverflow&lt;/a&gt;.) &amp;nbsp;Some of them are even willing to pay money for Tungsten Replicator implementations. &amp;nbsp;You know there is a real need when that happens. &amp;nbsp;I expect such requests will grow as more applications implement NoSQL stores and have to set up feeds to or from SQL. &amp;nbsp; Heterogeneous replication in this sense is a good proxy for solution maturity, and at least a couple of the NoSQL stores seem to be getting there. &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;div&gt;The MySQL to MongoDB hackathon is our first crack at this problem. &amp;nbsp;There is no guarantee the results will be especially usable, because Tungsten will not have a built-in solution for SQL to JSON mapping. &amp;nbsp;(I'm still noodling about that problem--suggestions welcome.) &amp;nbsp;Still, we will start to understand the problem space better and pave the way for more robust solutions over the course of the next year or so. &amp;nbsp;I'm looking forward to learning more a lot more about non-SQL stores along the way.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;If you are attending the conference you can hear about our results in person. &amp;nbsp;You can also tune into the #tungsten IRC channel at irc.freenode.net starting Friday May 5th around 10pm GMT. &amp;nbsp;I'll post regular updates over the weekend. &amp;nbsp;Feel free to drop by on IRC to see how we are doing or even to help out. &amp;nbsp;We will not single-handedly end the SQL/NoSQL wars, but at least we can help them learn how to share.&amp;nbsp;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-2444798738935647056?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/2444798738935647056/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=2444798738935647056' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/2444798738935647056'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/2444798738935647056'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/05/time-for-sqlnosql-group-hug.html' title='Time for a SQL/NoSQL Group Hug'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-1584284761200198739</id><published>2011-05-02T16:53:00.000-07:00</published><updated>2011-05-02T18:31:16.126-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='PostgreSQL'/><title type='text'>Tungsten Supports Logical Replication on PostgreSQL</title><content type='html'>Thanks to my colleague Linas Virbalas, &lt;a href="http://code.google.com/p/tungsten-replicator"&gt;Tungsten Replicator&lt;/a&gt; has just taken the next step to support full logical replication for PostgreSQL. &amp;nbsp;Linas posted an article today on his new blog &lt;a href="http://flyingclusters.blogspot.com/2011/05/advanced-logical-replication-for.html"&gt;describing PostgreSQL logical replication using SLONY triggers&lt;/a&gt;. &amp;nbsp;I saw a demo of his implementation and was really impressed. &amp;nbsp; For more information you should read the article, which provides an excellent description of how Tungsten replicates from &lt;a href="http://slony.info/"&gt;SLONY&lt;/a&gt; logs. &lt;br /&gt;&lt;br /&gt;It is pretty exciting whenever Tungsten replicates data to or from a new DBMS type, but PostgreSQL logical replication is really special. &amp;nbsp;Tungsten Replicator has been able to manage native &lt;a href="http://www.postgresql.org/docs/9.0/interactive/warm-standby.html"&gt;warm standby&lt;/a&gt; and &lt;a href="http://www.postgresql.org/docs/9.0/interactive/warm-standby.html#STREAMING-REPLICATION"&gt;log streaming&lt;/a&gt; replication more than a year using a script-based plugin. &amp;nbsp;This is fine for offering copies for failover but rather limited for problems beyond simple availability. &amp;nbsp;Logical replication opens up a whole new set of solutions that include multi-master replication, heterogeneous data transfer, and large-scale read scaling on replicas. &amp;nbsp;It is also a key building block for advanced clustering capabilities like zero-downtime upgrade as well as for building multi-tenant systems for SaaS applications. &amp;nbsp;After years of working on these problems for MySQL we are glad to finally attack them head on for PostgreSQL as well.&lt;br /&gt;&lt;br /&gt;There is still a lot of work to achieve fully functional, easy-to-use PostgreSQL logical replication, but the work that Linas described gives Tungsten a clear path forward. &amp;nbsp;I expect we will make rapid progress, because so many of the other parts of Tungsten Replicator are already in place. &amp;nbsp;Meanwhile, Linas has put together a very readable blog that should make interesting reading for years to come.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-1584284761200198739?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/1584284761200198739/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=1584284761200198739' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/1584284761200198739'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/1584284761200198739'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/05/tungsten-supports-logical-replication.html' title='Tungsten Supports Logical Replication on PostgreSQL'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-5238021469903000399</id><published>2011-04-20T17:33:00.000-07:00</published><updated>2011-04-20T17:33:28.097-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='Tungsten'/><title type='text'>Belated Thanks to MySQL Community</title><content type='html'>Tungsten Replicator won &lt;a href="http://openlife.cc/blogs/2011/april/oreilly-mysql-conference-community-awards-2011-winners-are"&gt;O'Reilly Application of the Year at the 2011 O'Reilly MySQL Conference&lt;/a&gt;, together with &lt;a href="http://www.percona.com/software/percona-xtrabackup/"&gt;Percona's XtraBackup&lt;/a&gt;. &amp;nbsp;&lt;a href="http://datacharmer.org/"&gt;Giuseppe Maxia&lt;/a&gt; also received an award for Community Contributor of the Year.  Having now worked with Giuseppe for almost half a year I know from personal experience his reward is truly deserved. &amp;nbsp;All in all we had a very good week, especially since the replicator award was a complete surprise. &lt;br /&gt;&lt;br /&gt;Things were so busy during and after the MySQL conference it was difficult to write a timely thank-you note.  I hope it's not too late to thank the committee now for both awards.  &lt;br /&gt;&lt;br /&gt;More importantly, I would like to thank the MySQL community as a whole. &amp;nbsp;Replicated data is the lifeblood of MySQL applications. &amp;nbsp;There has been a long history of innovation both within the MySQL engineering team as well as the community as a whole. &amp;nbsp;Working on replication for MySQL is a bit like building cars to operate on the German Autobahn. &amp;nbsp;If you can compete here you can compete anywhere.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-5238021469903000399?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/5238021469903000399/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=5238021469903000399' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/5238021469903000399'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/5238021469903000399'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/04/belated-thanks-to-mysql-community.html' title='Belated Thanks to MySQL Community'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-4808986853339021855</id><published>2011-04-14T12:22:00.000-07:00</published><updated>2011-04-14T18:22:54.338-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='Tungsten'/><title type='text'>Settling in at code.google.com</title><content type='html'>Tungsten Replicator code is now fully open source and published on code.google.com.&amp;nbsp; Here is our new home in case you do not yet know it:&amp;nbsp;&amp;nbsp;&lt;a href="http://code.google.com/p/tungsten-replicator/"&gt;http://code.google.com/p/tungsten-replicator&lt;/a&gt;. &amp;nbsp;I hope you will visit our new digs and admire the furniture. &lt;br /&gt;&lt;br /&gt;The fact that the replicator is now fully open source under GPL V2 is kind of old news, so I would instead like to talk about something else: &amp;nbsp;our initial experience setting up the replicator project at&amp;nbsp;&lt;a href="http://code.google.com/"&gt;code.google.com&lt;/a&gt;. &amp;nbsp;In a nutshell, it has been excellent. &amp;nbsp; There are several things that stand out.&lt;br /&gt;&lt;ol&gt;&lt;li&gt;The site is incredibly easy to use. &amp;nbsp; You can customize the home page, add members, add external links, etc. quickly and without having to resort to help.&amp;nbsp;&lt;/li&gt;&lt;li&gt;It has everything we need. &amp;nbsp;The front page is excellent--clean but also all the information users need to get started. &amp;nbsp;Useful features like issue trackers and Google Groups are cleanly integrated. &amp;nbsp;&amp;nbsp;&lt;/li&gt;&lt;li&gt;It is very fast. &amp;nbsp;&lt;/li&gt;&lt;li&gt;So far it seems to have just the right mix of open and closed for our project. &amp;nbsp;Anybody can post to the groups or log issues, but only committers on the project have write access to code and ability to move bugs through issue status.&amp;nbsp;&lt;/li&gt;&lt;/ol&gt;The only problems I have run into personally involve SVN code access. &amp;nbsp;For example, say you check out using the http rather than https URL as in:&lt;br /&gt;&lt;code&gt;svn co http://tungsten-replicator.googlecode.com/svn/trunk/builder&lt;/code&gt;&lt;br /&gt;&lt;div&gt;If you edit something and try to check in you get a message like the following:&amp;nbsp;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;$ svn commit -m "This does not work"&lt;br /&gt;svn: Commit failed (details follow):&lt;br /&gt;svn: Server sent unexpected return value (405 Method Not Allowed) in response to MKACTIVITY request for '/svn/!svn/act/8d94e398-83ba-46f3-aae2-bd10cb707c4b'&lt;br /&gt;svn: Your commit message was left in a temporary file:&lt;br /&gt;svn:    '/home/rhodges/google/tungsten-replicator/builder/svn-commit.tmp'&lt;/code&gt;&lt;/pre&gt;This message is definitely in the "not helpful" category. &amp;nbsp;Perhaps it is some sort of defense against evildoers. &amp;nbsp;However, this might be subversion behavior and nothing to do with Google. &amp;nbsp;If you receive such a message, run &lt;b&gt;svn info&lt;/b&gt; to&amp;nbsp;check the SVN URL. &amp;nbsp;If you see http instead of https you have found the cause. &amp;nbsp; Unfortunately the cure seems to be to check out again properly in another location, copy in your changes, and then commit.&lt;br /&gt;&lt;br /&gt;Site credentials are a more insidious problem. &amp;nbsp;Android phone users need to have a Google GMail account to access updates and download apps. &amp;nbsp;(At least that's true for my provider.) &amp;nbsp;Browsers like Firefox do not keep accounts separated properly, so you may run into account confusion when you first get started. &amp;nbsp;On Mac OS X you can get the wrong account in your keychain, which in turn leads to more confusing error messages. &amp;nbsp;This &lt;i&gt;&lt;u&gt;is&lt;/u&gt;&lt;/i&gt; a Google problem. &amp;nbsp;There is a creeping form of web single-signon using Google, Facebook, and other accounts as identifiers with unintended side-effects for work and personal interactions. &amp;nbsp;It makes you wonder what other problems are out there.&lt;br /&gt;&lt;br /&gt;But I digress.  As far as the project is concerned the issues look pretty minor. &amp;nbsp;At this point I would recommend code.google.com wholeheartedly for open source projects. &lt;br /&gt;&lt;br /&gt;p.s., Tungsten Replicator 2.0.2 is on the way. &amp;nbsp;More on that in another post.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-4808986853339021855?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/4808986853339021855/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=4808986853339021855' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/4808986853339021855'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/4808986853339021855'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/04/settling-in-at-codegooglecom.html' title='Settling in at code.google.com'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-8849272493816537567</id><published>2011-04-02T10:50:00.000-07:00</published><updated>2011-04-02T10:50:00.249-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><title type='text'>O'Reilly Conference Tungsten Talks and Some Welcome Open Source Progress</title><content type='html'>The &lt;a href="http://en.oreilly.com/mysql2011"&gt;O'Reilly MySQL 2011 conference&lt;/a&gt; is coming up fast. &amp;nbsp;It should be a good conference as it covers the increasingly diverse MySQL community and MySQL alternatives very well. &amp;nbsp; As usual, there are some painful choices about which talks to attend. &amp;nbsp;I'm doing two talks myself that I hope you have on your list:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://en.oreilly.com/mysql2011/public/schedule/detail/19268"&gt;Curing Replication Deprivation with Tungsten&lt;/a&gt; -- A tutorial together with my colleague Ed Archibald. &amp;nbsp;It covers everything you ever wanted to know about how to use parallel replication, handle multi-master/multi-source, replication to PostgreSQL/Oracle, etc. &amp;nbsp;We will have a short section at the end about how to build full clusters with Tungsten Enterprise. &amp;nbsp; Giuseppe Maxia is threatening to join and do some of his famous demos. &amp;nbsp;Apparently doing a tutorial on replication in the morning is not enough to tire him out.&amp;nbsp;&lt;/li&gt;&lt;li&gt;&lt;a class="url uid" href="http://en.oreilly.com/mysql2011/public/schedule/detail/17259" name="session17259"&gt;Preparing for the Big Oops:  How to Build Disaster Recovery Sites for MySQL&lt;/a&gt;&amp;nbsp;-- Survey of methods as well as things to have in mind when building a disaster recovery site. &amp;nbsp;This will cover everything I can think of, not just Tungsten. &amp;nbsp;&lt;/li&gt;&lt;/ul&gt;If you do not get your fill from our tutorial, Ed Archibald will also be doing another talk on Tungsten Enterprise&amp;nbsp;&lt;a href="http://en.oreilly.com/mysql2011/public/schedule/detail/19742"&gt;explaining how you can build Database-as-a-service using Tungsten&lt;/a&gt;. &amp;nbsp;&amp;nbsp;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Finally, the open source news. &amp;nbsp;We have been working on getting Tungsten Replicator fully open sourced including features like parallel replication, cool multi-master capabilities, and our fast disk log. &amp;nbsp;Getting there required jumping through a few hoops internally, but I'm happy to say the jumping is done. &amp;nbsp;As soon as we finish a couple of merges between branches we will post a full copy of Tungsten Replicator 2.0 on &lt;a href="http://code.google.com/p/tungsten-replicator/"&gt;our new home at code.google.com&lt;/a&gt;&amp;nbsp;under a GPL V2 license. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It will take us a while to get used to the new bug tracker, wikis, and forums based on Google Groups, but we should be starting to settle in by 7 April when Giuseppe presents &lt;a href="http://www.sfmysql.org/events/16826186/"&gt;Advanced MySQL Replication for the Masses&lt;/a&gt;&amp;nbsp;at the MySQL SFO meetup. &amp;nbsp;Come visit us! &amp;nbsp;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-8849272493816537567?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/8849272493816537567/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=8849272493816537567' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/8849272493816537567'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/8849272493816537567'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/04/oreilly-conference-tungsten-talks-and.html' title='O&apos;Reilly Conference Tungsten Talks and Some Welcome Open Source Progress'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-4551905500265352013</id><published>2011-03-30T02:08:00.000-07:00</published><updated>2011-03-30T02:08:00.371-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='SaaS'/><title type='text'>Slouching towards Multi-Master Conflict Resolution</title><content type='html'>This title is not nearly as snappy as Yeats' line from "&lt;a href="http://en.wikisource.org/wiki/The_Second_Coming"&gt;The Second Coming&lt;/a&gt;," but it will have to do.&amp;nbsp; Conflict resolution has been a holy grail for us on Tungsten--we had people asking for it when MySQL was still MySQL AB, lo these many years ago. &amp;nbsp; Well, it's finally starting to happen after about a year of work. &amp;nbsp;&lt;br /&gt;&lt;br /&gt;Let's start with a simple multi-master configuration. &amp;nbsp;To replicate bi-directionally between two masters, you typically run Tungsten replicator on each master with two replication services on each master. &amp;nbsp;The first &lt;i&gt;local service&lt;/i&gt; reads the master log. &amp;nbsp;The second &lt;i&gt;remote service&lt;/i&gt;&amp;nbsp;is a slave of the other master. &amp;nbsp;It looks like the following picture:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-UKovv7KTaEg/TZLiGAJ9DbI/AAAAAAAAAHE/lK0WoY168bs/s1600/multi-master-bidi.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="115" src="http://4.bp.blogspot.com/-UKovv7KTaEg/TZLiGAJ9DbI/AAAAAAAAAHE/lK0WoY168bs/s400/multi-master-bidi.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;One of the big problems in multi-master replication is to avoid loops. &amp;nbsp;Let's say you update something in the SJC master. &amp;nbsp;The transaction replicates to NYC and appears in the log. &amp;nbsp;Then it wants to replicate back to SJC and appear in the log there. &amp;nbsp;If we don't do anything, the poor transaction will loop forever. &lt;br /&gt;&lt;br /&gt;Tungsten solves this problem using a filter named BidiRemoteSlaveFilter that runs on slave pipelines &amp;nbsp;It has a simple job, which is to identify where transactions originated and keep local transactions that are returning from the other master from being applied again. &amp;nbsp;We use a variety of tricks to "tag" SQL in a way that allows us to deduce the origin--the most common are comments added to SQL statements or specially formatted row updates that we add to user transactions. &amp;nbsp;As long as you set things up properly and don't break some simple rules you can now replicate bi-directionally between 2 or more masters. &lt;br /&gt;&lt;br /&gt;This brings us to conflict resolution. &amp;nbsp;Conflict resolution prevents incompatible transactions from colliding with each other. &amp;nbsp;The BidiRemoteSlaveFilter does a simple form of conflict resolution by preventing transactions from the local service from looping back and being applied again. &amp;nbsp;However, it's a very short hop to filters that that address application conflicts. &lt;br /&gt;&lt;br /&gt;Here's a simple example that is next on my list to implement. &amp;nbsp;It is not unusual to split customers in multi-tenant applications across sites so that they have active updates on only one site and backup copies on the others. &amp;nbsp;You could imagine a filter that works off a simple file like the following:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;[sjc]&lt;br /&gt;   sharks&lt;br /&gt;   athletics&lt;br /&gt;[nyc]&lt;br /&gt;   mets&lt;br /&gt;   yankees&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;This tells the filter that transactions with shard ID sharks and athletics are allowed on the sjc master from local application. &amp;nbsp;However, if the nyc master updates these, we will reject the updates when the nyc remote service tries to apply them and generate an error message in the log, or perhaps put them in a special log file for later inspection. &amp;nbsp;You could even generate a dummy update on the local master that would result in the sjc data being sent back over to correct the nyc DBMS information via replication. &lt;br /&gt;&lt;br /&gt;What we have just done is implemented conflict resolution for a system-of-record approach to multi-site data management. &amp;nbsp; There are many types of conflicts as well as ways to manage them. &amp;nbsp;Tungsten Replicator filters have a lot of potential to implement other schemes as well. &amp;nbsp;Filters are pluggable, so there is a convenient escape hatch if you need to do specialized rules of your own. &amp;nbsp;Meanwhile, there is plenty of scope for Tungsten development to provide useful conflict resolution mechanisms.&lt;br /&gt;&lt;br /&gt;The Yeats poem I referred to at the beginning of the article is one of my all-time favorites. &amp;nbsp;Here are the last two lines:&lt;br /&gt;&lt;pre&gt;And what rough beast, its hour come round at last,&lt;br /&gt;Slouches towards Bethlehem to be born?&lt;/pre&gt;This could certainly describe a lot of software projects. &amp;nbsp; However, Tungsten is not like that at all. &amp;nbsp;We Tungsten engineers wear white lab jackets with pocket protectors and have shiny clipboards to take notes on our breakthroughs. &amp;nbsp;We rarely slouch.&lt;br /&gt;&lt;br /&gt;P.s. Speaking of Tungsten engineers my colleague &lt;a href="http://datacharmer.org/"&gt;Giuseppe Maxia&lt;/a&gt; and I will be&amp;nbsp;doing a webinar on multi-master replication on Thursday March 31st. &amp;nbsp;It will be mostly technical with only a small amount of marketing fluff. &amp;nbsp;As usual Giuseppe has cooked up a cool demo. &amp;nbsp;Sign up at &lt;a href="http://www.continuent.com/"&gt;www.continuent.com&lt;/a&gt; if you would like to find out more.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-4551905500265352013?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/4551905500265352013/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=4551905500265352013' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/4551905500265352013'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/4551905500265352013'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/03/slouching-towards-multi-master-conflict.html' title='Slouching towards Multi-Master Conflict Resolution'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-UKovv7KTaEg/TZLiGAJ9DbI/AAAAAAAAAHE/lK0WoY168bs/s72-c/multi-master-bidi.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-5488735872437270768</id><published>2011-03-22T22:12:00.000-07:00</published><updated>2011-03-22T22:12:01.048-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='Drizzle'/><title type='text'>Parallel Replication Using Shards Is the Only Workable Approach for SQL</title><content type='html'>There have been a couple of recent blog articles (&lt;a href="http://www.tusacentral.net/joomla/index.php/mysql-blogs/96-a-dream-on-mysql-parallel-replication"&gt;here&lt;/a&gt; and &lt;a href="http://openlife.cc/blogs/2011/march/parallelizing-mysql-replication-slave"&gt;here&lt;/a&gt;) asking for parallel replication based on something other than schemas. &amp;nbsp;These articles both focus on the problem of parallelizing updates within a single MySQL schema.&amp;nbsp;&amp;nbsp;I read these with great interest, not least because they both mentioned&amp;nbsp;&lt;a href="http://code.google.com/p/tungsten-replicator/"&gt;Tungsten&lt;/a&gt;&amp;nbsp;(thanks!) and also found that our&amp;nbsp;schema-based parallelization approach is too limited. &amp;nbsp;It is therefore worth a short article explaining exactly what the Tungsten approach is and why we chose it. &lt;br /&gt;&lt;br /&gt;First of all, Tungsten does not exactly use schema-based parallel replication. &amp;nbsp;Tungsten is actually based on what I call the &lt;i&gt;serialized shard&lt;/i&gt; model of replication. &amp;nbsp;We assign global transaction IDs to all transactions, which means that for any particular set of transactions we can always figure out the correct serialization and apply in the right order. &amp;nbsp;This is true even if the transactions travel across independent replication paths or if we have master failover. &lt;br /&gt;&lt;br /&gt;Second, we assign a shard ID to all transactions. &amp;nbsp;Shards are independent streams of transactions that execute correctly when applied by themselves in serial order. &amp;nbsp;Shards are typically independent, which means transactions in different shards can execute in parallel without deadlocking or corrupting data. &amp;nbsp;This is the case when each shard contains data for a single customer in a multi-tenant application. &amp;nbsp;We also have a notion of "critical shards," which are shards that contain global data, such as shared currency rate tables. &amp;nbsp;Updates in critical shards cause full serialization across all shards. &amp;nbsp;&lt;br /&gt;&lt;br /&gt;You can define shards in a variety of ways, but as a practical matter identifying durable shards inside individual MySQL schemas is hard for most applications, especially if there are constraints between tables or you have large transactions. &amp;nbsp; Many SQL applications tend to make most of their updates to a small number of very large tables, which makes finding stable dividing lines even more difficult. &amp;nbsp;Schemas are therefore a natural unit of sharding, and Tungsten uses these by default. &lt;br /&gt;&lt;br /&gt;Schema-based sharding seems pretty limiting, but for current SQL databases it is really the only approach that works. &amp;nbsp;Here are some important reasons that give you a flavor of the issues.&lt;br /&gt;&lt;br /&gt;* &lt;b&gt;Restart&lt;/b&gt;. &amp;nbsp;To handle failures you need to mark the exact restart point on each replication apply thread or you will either repeat or miss transactions. &amp;nbsp;This requires precise and repeatable serialization on each thread, which you get with the serialized shard model.&lt;br /&gt;&lt;br /&gt;* &lt;b&gt;Deadlocks&lt;/b&gt;. &amp;nbsp;If there are conflicts between updates you will quickly hit deadlocks. &amp;nbsp;This is especially true because one of the biggest single thread replication optimizations is block commit, where you commit dozens of success transactions at once--it can raise throughput by 100% in some cases. &amp;nbsp;Deadlocks on the other hand can reduce effective throughput to zero in pathological cases. &amp;nbsp; Shard-based execution avoids deadlocks.&lt;br /&gt;&lt;br /&gt;* &lt;b&gt;Ordering&lt;/b&gt;. &amp;nbsp;SQL gives you a lot of ways to shoot yourself in the foot through bad transaction ordering. &amp;nbsp;You can't write to a table before creating it. &amp;nbsp;You can't delete a row before it is inserted. &amp;nbsp;Violating these rules does not just lead to invalid data but also causes errors that stop replication. &amp;nbsp;The workarounds are either unreliable and slow (conflict resolution) or impractical for most applications (make everything an insert). &amp;nbsp;To avoid this you need to observe serialization very carefully.&lt;br /&gt;&lt;br /&gt;* &lt;b&gt;Throughput&lt;/b&gt;. &amp;nbsp;SQL transactions in real systems vary tremendously in duration, which tends to result in individual long transactions blocking simpler parallelization schemes that use in-memory distribution of updates. &amp;nbsp;In the Tungsten model we can solve this by letting shard progress vary (by hours potentially), something that is only possible with a well-defined serialization model that deals with dependencies between parallel update streams. &amp;nbsp;I don't know of another approach that deals with this problem. &lt;br /&gt;&lt;br /&gt;If you mess up the solution to any of the foregoing problems, chances are good you will irreparably corrupt data, which leads to replication going completely off the rails. &amp;nbsp;Then you reprovision your slave(s). &amp;nbsp;The databases that most need parallel replication are very large, so this is a multi-hour or even multi-day process. &amp;nbsp;It makes for unpleasant calls with customers when you tell them they need to do this. &lt;br /&gt;&lt;br /&gt;I don't spend a lot of time worrying that Tungsten parallel replication is not well suited to the single big schema problem. &amp;nbsp;So far, the only ways I can think of making it work scalably require major changes to the DBMS or the applications that use it. &amp;nbsp;In many cases your least costly alternative may be to use SSDs to boost slave I/O performance. &lt;br /&gt;&lt;br /&gt;My concerns about Tungsten's model lie in a different area. &amp;nbsp;The serialized shard model is theoretically sound--it has essentially the same semantics as causally dependent messaging in distributed systems. &amp;nbsp;However, if we fail to identify shards correctly (and don't know we failed) we will have crashes and corrupt data. &amp;nbsp;I want Tungsten either to work properly or tell users it won't work and degrade gracefully to full serialization. &amp;nbsp;If we can't do one of these two for every conceivable sequence of transactions that's a serious problem. &lt;br /&gt;&lt;br /&gt;So, to get back to my original point, serialized shards are the best model for parallel replication in SQL databases as we find them today. &amp;nbsp;I suspect if you look at some of the other incipient designs for parallel replication on MySQL you will find that they follow this model in the end if not at first. &amp;nbsp;I would think in fact that the next step is to add MySQL features that make sharded replication more effective. &amp;nbsp;The drizzle team seems to be thinking along these lines already.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-5488735872437270768?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/5488735872437270768/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=5488735872437270768' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/5488735872437270768'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/5488735872437270768'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/03/parallel-replication-using-shards-is.html' title='Parallel Replication Using Shards Is the Only Workable Approach for SQL'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-4579074067984380507</id><published>2011-03-20T20:56:00.000-07:00</published><updated>2011-03-20T22:47:56.378-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='IT Industry'/><category scheme='http://www.blogger.com/atom/ns#' term='Apple'/><title type='text'>Is Apple Good for Innovation?</title><content type='html'>Just about everyone on the planet agrees that Apple products are the soul of innovative design.&amp;nbsp; But are they good for innovators?&amp;nbsp; For me the answer is "not so much." &lt;br /&gt;&lt;br /&gt;I have been using Apple laptops and iPhones for years.&amp;nbsp; As a software developer, I have a list of annoyances with Mac OS X starting with Apple's incomprehensible management of Java.&amp;nbsp; However, Mac OS X is far more productive than MS Windows, with its viruses, crummy OS releases, and bloatware.&amp;nbsp; iPhones are close to worthless as telephones in the area where I live in large part due to ATT's network.&amp;nbsp; But you can now switch to Verizon, so that's not such a problem either.&lt;br /&gt;&lt;br /&gt;The real problem with Apple is that their products are closed.&amp;nbsp; Want to install a new file system?&amp;nbsp; Not here.&amp;nbsp; Want to pick a different motherboard to play around with power utilization?&amp;nbsp; Try somewhere else.&amp;nbsp; Want to know what the OS is really doing under the covers or (gasp) inspect the source code?&amp;nbsp; Dieu forfend!&lt;br /&gt;&lt;br /&gt;Innovation in my chosen field of databases is increasingly based on breaking down the dividing lines between hardware and software to manage massive quantities of data economically and quickly.&amp;nbsp; The more I learn about hardware, the less I want fully integrated products. &amp;nbsp;I want devices I can interact with and learn from. &amp;nbsp;I want visibility into internals. &amp;nbsp;I want works-in-progress, not ready-made perfection. &amp;nbsp;In short, I want open platforms that give me the parts but do not tell me what to build with them. &lt;br /&gt;&lt;br /&gt;A few weeks ago my iPhone dropped on the floor and shattered. &amp;nbsp;The replacement is a Droid 2 Global running Android. &amp;nbsp;The user interface is clumsy. &amp;nbsp;You have to watch out for viruses again. &amp;nbsp;But the hardware is lightning fast. &amp;nbsp;There is a free-for-all of people inventing new Android applications. &amp;nbsp;The source code for Android itself is available on &lt;a href="http://code.google.com/"&gt;code.google.com&lt;/a&gt;. &amp;nbsp;The open nature of Android is rapidly making it the locus of innovation for mobile devices. &amp;nbsp;I feel at home already.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-4579074067984380507?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/4579074067984380507/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=4579074067984380507' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/4579074067984380507'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/4579074067984380507'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/03/is-apple-good-for-innovation.html' title='Is Apple Good for Innovation?'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-1592246192925073235</id><published>2011-03-20T02:15:00.000-07:00</published><updated>2011-03-20T02:15:00.640-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='SaaS'/><title type='text'>Tuning Tungsten Parallel Replication Performance</title><content type='html'>Last month my colleague &lt;a href="http://datacharmer.org/"&gt;Giuseppe Maxia&lt;/a&gt; described &lt;a href="http://datacharmer.blogspot.com/2011/02/advanced-replication-for-masses-part-ii.html"&gt;how to operate Tungsten parallel replication&lt;/a&gt;.  Since then we have been doing a good bit of benchmarking on both synthetic as well as real production loads.  In this article I would like to follow up with some tips about how you can goose up parallel replication performance.&amp;nbsp; These apply to Tungsten Replicator 2.0.1, which you can find &lt;a href="http://code.google.com/p/tungsten-replicator/"&gt;here&lt;/a&gt;.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;The first way to get good performance with Tungsten is to have the right workload.  As explained in an &lt;a href="http://scale-out-blog.blogspot.com/2010/10/parallel-replication-on-mysql-report.html"&gt;earlier article&lt;/a&gt; on this blog, Tungsten parallel replication works by replicating independent databases (aka shards) in parallel.&amp;nbsp; Here is a picture that summarizes what is going on. &lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://lh3.googleusercontent.com/-MN_Ww6byo7U/TYXEUdB0qFI/AAAAAAAAAG8/DQpMHjiwWAY/s1600/parallel-apply-workload-example.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="227" src="https://lh3.googleusercontent.com/-MN_Ww6byo7U/TYXEUdB0qFI/AAAAAAAAAG8/DQpMHjiwWAY/s400/parallel-apply-workload-example.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;If you have a lot of schemas, if the updates are distributed evenly across schemas, and if you don't have many dependencies between schemas that require full serialization, parallel replication can speed things up significantly for I/O-bound workloads. &amp;nbsp;For example, Tungsten runs three times faster than MySQL native replication on large datasets when the slave is catching up to the master following mysqld restart.&amp;nbsp;&lt;br /&gt;&lt;br /&gt;Catch-up is a famous slave lag case and one where Tungsten can be quite helpful. &amp;nbsp;(I think we will be faster in the future, but this is a good start.) &amp;nbsp;Nevertheless, there's a chance you'll need to do a bit of tuning to see such benefits yourself. &lt;br /&gt;&lt;br /&gt;Tungsten currently uses a structure called a parallel queue to enable parallelization. &amp;nbsp;The parallel queue typically sits at the end of a replicator pipeline in front of the parallel apply threads, as shown in the following handy diagram.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://lh4.googleusercontent.com/-ErXrkaUarBE/TYWv8BwaQSI/AAAAAAAAAG4/pcF9voS_pak/s1600/simple-parallel-pipeline.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="242" src="https://lh4.googleusercontent.com/-ErXrkaUarBE/TYWv8BwaQSI/AAAAAAAAAG4/pcF9voS_pak/s400/simple-parallel-pipeline.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;One key to getting decent parallel replication performance is to watch the parallel queue in operation. &amp;nbsp;In Tungsten Replicator 2.0.1 we introduced a new status command &lt;code&gt;trepctl status -name stores&lt;/code&gt; that goes a long way to help diagnose how well parallel replication is performing. &amp;nbsp; Here's a typical example using a 6 channel queue store. &lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;$ trepctl status -name stores&lt;br /&gt;Processing status command (stores)...&lt;br /&gt;NAME                VALUE&lt;br /&gt;----                -----&lt;br /&gt;criticalPartition : -1&lt;br /&gt;discardCount      : 0&lt;br /&gt;eventCount        : 3217&lt;br /&gt;maxSize           : 1000&lt;br /&gt;name              : parallel-queue&lt;br /&gt;queues            : 6&lt;br /&gt;serializationCount: 1&lt;br /&gt;serialized        : false&lt;br /&gt;stopRequested     : false&lt;br /&gt;store.queueSize.0 : 0&lt;br /&gt;store.queueSize.1 : 480&lt;br /&gt;store.queueSize.2 : 310&lt;br /&gt;store.queueSize.3 : 1000&lt;br /&gt;store.queueSize.4 : 739&lt;br /&gt;store.queueSize.5 : 407&lt;br /&gt;storeClass        : com.continuent.tungsten.enterprise.replicator.store.ParallelQueueStore&lt;br /&gt;storeSize         : 2936&lt;br /&gt;syncEnabled       : true&lt;br /&gt;syncInterval      : 100&lt;br /&gt;Finished status command (stores)...&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;The two most important things to look at are distribution of transactions across queues and serialization. &amp;nbsp;Let's start with transaction distribution. &amp;nbsp;In this particular example we were running a parallel queue with 6 channels but only 5 databases. &amp;nbsp;The distribution therefore looks pretty good. &amp;nbsp;One queue is empty but the other have a fairly even distribution of transactions. &lt;br /&gt;&lt;br /&gt;Notice that one queue has exactly 1000 transactions. &amp;nbsp;In Tungsten Replicator 2.0.1 the parallel queue has a maximum size parameter (maxSize), which is set to 1000 for this example run. &amp;nbsp;Once an individual queue hits the maxSize limit, the entire parallel queue blocks. &amp;nbsp;It is not uncommon to see one queue blocking in this way if the replicator is catching up, which is exactly what is happening here. &amp;nbsp;In fact, if the queues are all empty it is possible Tungsten is somehow not supplying transactions to the queue fast enough. &amp;nbsp;That is not a problem here.&lt;br /&gt;&lt;br /&gt;Bad workloads on the other hand tend to have a lot of transactions in one or two queues and few or none in all the rest.  The following is an example of a possibly bad distribution.  &lt;br /&gt;&lt;pre&gt;&lt;code&gt;&lt;br /&gt;$ trepctl status -name stores&lt;br /&gt;Processing status command (stores)...&lt;br /&gt;NAME                VALUE&lt;br /&gt;----                -----&lt;br /&gt;...&lt;br /&gt;store.queueSize.0 : 0&lt;br /&gt;store.queueSize.1 : 4&lt;br /&gt;store.queueSize.2 : 3&lt;br /&gt;store.queueSize.3 : 972&lt;br /&gt;store.queueSize.4 : 0&lt;br /&gt;store.queueSize.5 : 15&lt;br /&gt;...&lt;br /&gt;Finished status command (stores)...&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;If you see such skewed distributions persistently, you may want to try to adjust the queue partitioning using the &lt;code&gt;shard.list&lt;/code&gt; file.  The default parallel queue partitioning algorithm hashes shards into channels.  This does not always gives optimal performance if your shards mostly happen to hash into the same channel. &amp;nbsp;The other possibility is that the workload is just badly distributed across databases.&lt;br /&gt;&lt;br /&gt;You can decide whether the workload or partitioning is at fault using the &lt;code&gt;trepctl status -name shards&lt;/code&gt; command. &amp;nbsp;Here's an example.&lt;br /&gt;&lt;pre&gt;&lt;code&gt;&lt;br /&gt;$ ./trepctl status -name shards&lt;br /&gt;Processing status command (shards)...&lt;br /&gt;NAME                VALUE&lt;br /&gt;----                -----&lt;br /&gt;appliedLastEventId: 000007:0000000000000384;20&lt;br /&gt;appliedLastSeqno  : 1471201&lt;br /&gt;appliedLatency    : 0.0&lt;br /&gt;eventCount        : 6&lt;br /&gt;shardId           : #UNKNOWN&lt;br /&gt;stage             : d-pq-to-dbms&lt;br /&gt;NAME                VALUE&lt;br /&gt;----                -----&lt;br /&gt;appliedLastEventId: 000005:0000000326365895;41&lt;br /&gt;appliedLastSeqno  : 1470999&lt;br /&gt;appliedLatency    : 0.0&lt;br /&gt;eventCount        : 311605&lt;br /&gt;shardId           : db1&lt;br /&gt;stage             : d-pq-to-dbms&lt;br /&gt;NAME                VALUE&lt;br /&gt;----                -----&lt;br /&gt;appliedLastEventId: 000005:0000000326512277;95&lt;br /&gt;appliedLastSeqno  : 1471200&lt;br /&gt;appliedLatency    : 0.0&lt;br /&gt;eventCount        : 298522&lt;br /&gt;shardId           : db2&lt;br /&gt;stage             : d-pq-to-dbms&lt;br /&gt;...&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;This shows that the distribution of transactions between the db1 and db2 databases is pretty even. &amp;nbsp;If you have many databases with roughly even values in the eventCount parameter, the workload is well suited for parallelization. &amp;nbsp;In that case you may want to assign shards explicitly in the shard.list file if you don't like the distribution in the parallel queue.&lt;br /&gt;&lt;br /&gt;Meanwhile, the previous example shows an example of another potential problem. &amp;nbsp;We also see counts for #UNKNOWN, which is a special shard ID that means "I could not tell what schema this is." #UNKOWN transactions can occur if Tungsten cannot parse a SQL statement properly or there is a transaction that updates multiple schemas. &amp;nbsp;In either case, Tungsten serializes the parallel queue. &lt;br /&gt;&lt;br /&gt;However it occurs, serialization is a performance killer because it means we have to block the parallel queue until all parallel transactions complete, execute one or more transactions serially, and then reopen the parallel queue. &amp;nbsp;You can see how often this is happening from the serializationCount value on trepctl status -name stores. &amp;nbsp;For many workloads a serializationCount value that is more than a few percent of the number in eventCount means the entire transaction stream is effectively serialized. &lt;br /&gt;&lt;br /&gt;If the serialization is occurring due to #UNKNOWN shards, you may be able to improve things using&amp;nbsp;a new value in the replicator service properties file that was added in version 2.0.1. It controls whether we assign the shard ID using the default schema even if Tungsten cannot tell from the SQL command what you are doing.&lt;br /&gt;&lt;pre&gt;&lt;code&gt;&lt;br /&gt;# Policy for shard assignment based on default database.  If 'stringent', use&lt;br /&gt;# default database only if SQL is recognized.  For 'relaxed' always use the&lt;br /&gt;# default database if it is available.&lt;br /&gt;replicator.shard.default.db=relaxed&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;Setting the parameter to &lt;code&gt;relaxed&lt;/code&gt; can help quite a bit if the problem is due to unusual SQL that confuses the parser. &amp;nbsp;On one workload we were able to reduce the serializationCount from about 10% of transactions to 0% in this way. &amp;nbsp;We then saw the expected speed-up from parallel replication.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-1592246192925073235?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/1592246192925073235/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=1592246192925073235' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/1592246192925073235'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/1592246192925073235'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/03/tuning-tungsten-parallel-replication.html' title='Tuning Tungsten Parallel Replication Performance'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='https://lh3.googleusercontent.com/-MN_Ww6byo7U/TYXEUdB0qFI/AAAAAAAAAG8/DQpMHjiwWAY/s72-c/parallel-apply-workload-example.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-6728775844058870500</id><published>2011-03-07T18:46:00.000-08:00</published><updated>2011-03-07T18:43:58.907-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><title type='text'>Understanding Tungsten Replication Services</title><content type='html'>If you follow Giuseppe Maxia's &lt;a href="http://datacharmer.blogspot.com/"&gt;Datacharmer blog&lt;/a&gt; you have seen several recent articles on &lt;a href="http://www.continuent.com/solutions/tungsten-replicator"&gt;Tungsten Replicator&lt;/a&gt;.&amp;nbsp; Giuseppe and I work closely together on replication at Continuent, and I have promised a matching set of articles about replication internals that match the practical introduction provided by Giuseppe.&amp;nbsp; In this first article I will describe &lt;i&gt;replication services&lt;/i&gt;, which are message processing flows that run in the Tungsten Replicator.&lt;br /&gt;&lt;br /&gt;Unlike many replication engines, Tungsten Replicator can run multiple replication services concurrently.&amp;nbsp; There is a central management interface that allows you  to start new replication services without disturbing services that  are already running.&amp;nbsp; Each replication service also has its own  management interface so that you can put the loaded service online,  offline, etc. without disturbing other replication work.&amp;nbsp; As Tungsten is written in Java, the management interfaces are based on JMX, a standard administrative interface for Java applications. &lt;br /&gt;&lt;br /&gt;Here is a simple diagram that shows a Tungsten Replicator with two replication services named fra and nyc that replicate from separate DBMS masters in Frankfurt and NYC into a single slave in San Francisco. &amp;nbsp; You can immediately see the power of replication services--a single Tungsten Replicator process can simultaneously replicate between several locations.&amp;nbsp; Replication services are an important building block for the type of complex setups that Giuseppe Maxia discussed in his &lt;a href="http://advanced%20replication%20for%20the%20masses%20-%20part%20iii%20-%20replication%20topologies%20/"&gt;blog article on Replication Topologies&lt;/a&gt;.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://lh5.googleusercontent.com/-2fvy-5TaWNM/TXVz0FjXDfI/AAAAAAAAAGs/TN_FwUR-q5k/s1600/replication-services-diagram-1.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="280" src="https://lh5.googleusercontent.com/-2fvy-5TaWNM/TXVz0FjXDfI/AAAAAAAAAGs/TN_FwUR-q5k/s640/replication-services-diagram-1.jpg" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Users who are handy with Java can write their own programs to manipulate the JMX interfaces directly.&amp;nbsp; If not, there is the &lt;b&gt;trepctl&lt;/b&gt; utility, which is supplied with the Tungsten Replicator and works off the command line.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;If the Tungsten Replicator architecture reminds you of a Java application server, you are absolutely right.&amp;nbsp; Java VMs have a relatively large resource footprint compared to ordinary C programs, so it is typically more efficient to put multiple applications in a single VM rather than running a lot of individual Java processes.&amp;nbsp; Tungsten replication services follow the same design pattern, except that instead of serving web pages they replicate database transactions.&amp;nbsp;&amp;nbsp; &lt;br /&gt;&lt;br /&gt;Let's now look a little more deeply at how Tungsten Replicator organizes replication services.&amp;nbsp; Each replication service runs a single pipeline, which is Tungsten parlance for a configurable message flow.&amp;nbsp; (For more on pipelines, read &lt;a href="http://scale-out-blog.blogspot.com/2010/04/customized-data-movement-with-tungsten.html"&gt;here&lt;/a&gt;.)&amp;nbsp; When the service starts, it loads an instance of a Java class called OpenReplicatorManager that handles the state machine for replication (online, offline, etc.) and provides the management interfaces for the services.&amp;nbsp; The OpenRepicatorManager instance in turn depends on a number of external resources from the file system and DBMS.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;Here is another diagram showing how Tungsten Replicator organizes all of the various parts for services.&amp;nbsp; Services need a configuration file for the pipeline, as well as various  bits of disk space to store transaction logs and replication metadata.&amp;nbsp; The big challenge is to ensure things do not accidentally collide. &lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://lh5.googleusercontent.com/-Z0rvmA2820Q/TXWWy3NwAOI/AAAAAAAAAGw/tmdU3SFJuFI/s1600/replication-services-diagram-2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="408" src="https://lh5.googleusercontent.com/-Z0rvmA2820Q/TXWWy3NwAOI/AAAAAAAAAGw/tmdU3SFJuFI/s640/replication-services-diagram-2.jpg" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;This layout seems a bit complex at first but is reasonably simple once you get used to it.&amp;nbsp; Let's start with service configuration using our &lt;b&gt;fra&lt;/b&gt; service as an example.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;Service configuration files are stored in the tungsten-replicator/conf directory.&amp;nbsp;&amp;nbsp; There are up to two files for each service.&amp;nbsp; The &lt;b&gt;static-fra.properties&lt;/b&gt; file defines all properties of the service, pipeline organization and properties like the replication role or master address that may change during operation.&amp;nbsp; The dynamic-fra.properties contains overrides to selected properties.&amp;nbsp; For instance, if you switch the replication role from slave to master as part of a failover operation, it goes in the &lt;b&gt;dynamic-fra.properties&lt;/b&gt; file.&amp;nbsp; Tungsten Replicator reads the static file first, then applies the overrides when it starts the service.&lt;br /&gt;&lt;br /&gt;Next, we have Tungsten transaction logs, also known as the Transaction History Log.&amp;nbsp; This is a list of all transactions to be replicated along with metadata like global transaction IDs and shard IDs.&amp;nbsp; THL files for each service are normally stored in the logs directory at the same level as the tungsten release directory itself.&amp;nbsp; There is a separate directory for each service, as for example &lt;b&gt;logs/fra&lt;/b&gt;.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;Next we have Tungsten relay logs.&amp;nbsp; These are downloaded binlogs from a MySQL master DBMS from which the replication service creates the Tungsten transaction logs.&amp;nbsp; Not every replication service uses these.&amp;nbsp; They are required when the MySQL master is on another host, or the binlogs are on an NFS-mounted file system, which Tungsten does not parse very efficiently yet.&amp;nbsp; Tungsten relay logs use the same pattern as the THL--everything is stored under relay-logs with a separate subdirectory for each service, for example &lt;b&gt;relay-logs/fra&lt;/b&gt;.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;Finally, there is metadata in the DBMS itself.&amp;nbsp; Each replication service has a database that it uses to store restart points for replication (table trep_commit_seqno) as well as heartbeats and consistency checks (tables heartbeat and consistency, respectively).&amp;nbsp;&amp;nbsp; The name of this database is tungsten_&lt;servicename&gt;&lt;i&gt;servicename&lt;/i&gt; as in &lt;b&gt;tungsten_fra&lt;/b&gt;.&amp;nbsp; &lt;/servicename&gt;&lt;br /&gt;&lt;br /&gt;Setting up services is difficult to do manually, so Tungsten Replicator 2.0 has a program named configure-service that defines new replication services and removes old ones by deleting all traces including the database catalogs. You can find out all about installation and starting services by looking the Tungsten Replicator 2.0 Installation and Configuration Guide, which is located &lt;a href="http://www.continuent.com/downloads/documentation"&gt;here&lt;/a&gt;.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;Services have been part of Tungsten Replicator for a while but we have only recently begun to talk about them more widely as part of the release of Tungsten Replicator 2.0.0 in February 2011, especially as we are start to do more work with multi-master topologies.&amp;nbsp; One of the comments we get is that replication services make Tungsten seem complicated and therefore harder to use, especially compared with MySQL replication, which is relatively easy to set up.&amp;nbsp; That's a fair criticism.&amp;nbsp; Tungsten Replicator is really a very configurable toolkit for replication and does far more than MySQL replication or just about any other open source replicator for that matter. &amp;nbsp; Like most toolkits, the trade-off for power is complexity.&lt;br /&gt;&lt;br /&gt;We are therefore working on automating as much of the configuration as possible, so that you can set up even relatively complex topologies with just a couple of commands.&amp;nbsp; You'll see more of this as we make additional replicator releases (version 2.0.1 will be out shortly) and push features fully into open source.&amp;nbsp;&amp;nbsp; Meanwhile, if you have comments on Tungsten 2.0.0 please feel free to post them back to us.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-6728775844058870500?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/6728775844058870500/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=6728775844058870500' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/6728775844058870500'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/6728775844058870500'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/03/understanding-tungsten-replication.html' title='Understanding Tungsten Replication Services'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='https://lh5.googleusercontent.com/-2fvy-5TaWNM/TXVz0FjXDfI/AAAAAAAAAGs/TN_FwUR-q5k/s72-c/replication-services-diagram-1.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-6791807547638633260</id><published>2011-01-30T18:36:00.000-08:00</published><updated>2011-01-31T10:13:39.960-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='PostgreSQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Drizzle'/><title type='text'>Virtual IP Addresses and Their Discontents for Database Availability</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;Virtual IP addresses or &lt;i&gt;VIPs&lt;/i&gt; are commonly used to enable database high availability.&amp;nbsp;&amp;nbsp; A standard failover design uses an active/passive DBMS server pair connected by replication and watched by a cluster manager.&amp;nbsp; The active database listens on a virtual IP address; applications use it for database connections instead of the normal host IP address. Should the active database fail, the cluster manager promotes the passive database server and shifts the floating IP address to the newly promoted host.&amp;nbsp; Application connections break and then reconnect to the VIP again, which points them to the new database. &lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_26KnjtB2MFo/TUU44yNMVXI/AAAAAAAAAGU/gs_WVUWxTEg/s1600/vip-based-failover.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="257" src="http://2.bp.blogspot.com/_26KnjtB2MFo/TUU44yNMVXI/AAAAAAAAAGU/gs_WVUWxTEg/s400/vip-based-failover.jpg" width="400" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;VIP-Based Database Availability Design&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;Virtual IP addresses are enticing because they are completely transparent to applications--no changes to database API behavior, no changes to connection strings, etc.&amp;nbsp; While virtual IP addresses seem simple, they depend on arcane TCP/IP behavior that is not especially well understood and not always consistent across different TCP/IP implementations.&amp;nbsp; Unless properly managed, virtual IP addresses can induce problems that run the gamut from simple lack of availability to split-brain, which can in turn lead to unrepairable data corruption.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;This blog article describes in some detail how virtual IP addresses work, specifically the behavior of &lt;a href="http://en.wikipedia.org/wiki/Address_Resolution_Protocol"&gt;Address Resolution Protocol&lt;/a&gt; (ARP) which is a core part of the TCP/IP stack that maps IP numbers to hardware addresses and permits VIPs to move.&amp;nbsp; We will then dig into how split-brain arises as a consequence of ARP behavior.&amp;nbsp; Finally we will look at ways to avoid split-brain or at least make it less dangerous when it occurs.&lt;br /&gt;&lt;br /&gt;&lt;u&gt;Note&lt;/u&gt;: the examples below use MySQL, in part because &lt;b&gt;mysqld&lt;/b&gt; has a convenient way to show the server host name.&amp;nbsp; However you can set up the same test scenarios with PostgreSQL or most other DBMS implementations for that matter.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;&lt;b&gt;What is a Virtual IP Address?&amp;nbsp;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Network interface cards (NICs) typically bind to a single IP address in TCP/IP networks.&amp;nbsp;&amp;nbsp; However, you can also tell the NIC to listen on extra addresses using the Linux &lt;b&gt;ifconfig&lt;/b&gt; command.&amp;nbsp; Such addresses are called &lt;i&gt;virtual IP addresses&lt;/i&gt; or VIPs for short.&lt;br /&gt;&lt;br /&gt;Let's say for example you have an existing interface eth0 listening on port 192.168.128.114.&amp;nbsp; Here's the configuration of that interface as shown by the ifconfig command:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;saturn# ifconfig eth0&lt;br /&gt;eth0      Link encap:Ethernet  HWaddr 08:00:27:ce:5f:8e  &lt;br /&gt;          inet addr:192.168.128.114  Bcast:192.168.128.255  Mask:255.255.255.0&lt;br /&gt;          inet6 addr: fe80::a00:27ff:fece:5f8e/64 Scope:Link&lt;br /&gt;          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1&lt;br /&gt;          RX packets:10057681 errors:0 dropped:0 overruns:0 frame:0&lt;br /&gt;          TX packets:8902384 errors:0 dropped:0 overruns:0 carrier:0&lt;br /&gt;          collisions:0 txqueuelen:1000 &lt;br /&gt;          RX bytes:6941125495 (6.9 GB)  TX bytes:6305062533 (6.3 GB)&lt;/code&gt;&lt;/pre&gt;You can allow the eth0 interface to accept traffic for another address 192.168.128.130 using the following simple command.&amp;nbsp; &lt;br /&gt;&lt;pre&gt;&lt;code&gt;saturn# ifconfig eth0:0 192.168.128.130 up&lt;/code&gt;&lt;/pre&gt;This command tells the TCP/IP stack to accept packets directed to IP address 192.168.128.130 as well as the original address 192.168.128.114.&amp;nbsp;&amp;nbsp; This means that if you are running a MySQL server on the host users can connect using either of the following commands:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;mysql -utungsten -psecret -h192.168.128.114 (or)&lt;br /&gt;mysql -utungsten -psecret -h192.168.128.130&lt;/code&gt;&lt;/pre&gt;Finally, you can move VIPs from host to host very easily by dropping them on the first host and binding to them on a second host, as shown in the following example: &lt;br /&gt;&lt;pre&gt;&lt;code&gt;saturn# ifconfig eth0:0 192.168.128.130 down&lt;br /&gt;...&lt;br /&gt;neptune# ifconfig eth0:0 192.168.128.130 up&lt;/code&gt;&lt;/pre&gt;Meanwhile, virtual IP addresses behave in every other way like standard IP addresses.&amp;nbsp; You can put them in DNS, applications can connect to them, etc.&amp;nbsp; VIP-based high availability is therefore a minimally invasive implementation for most applications--about the only requirement is that applications need to reconnect if their connection breaks. &lt;br /&gt;&lt;br /&gt;&lt;b&gt;How Virtual IP Addresses Work&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;To understand the weaknesses of virtual IP addresses for database high availability, it helps to understand exactly how the &lt;a href="http://en.wikipedia.org/wiki/TCP/IP_model"&gt;TCP/IP stack&lt;/a&gt; actually routes messages between IP addresses, including those that correspond to VIPs.&amp;nbsp;&amp;nbsp; The following diagram shows moving parts of the TCP/IP stack in a typical active/passive database set-up with a single client host.&amp;nbsp; In this diagram host saturn currently has virtual IP address 192.168.128.130. Neptune is the other database host.&amp;nbsp; Mercury is the client host. &lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_26KnjtB2MFo/TUU45rxREnI/AAAAAAAAAGY/RqJzG_8Df3Y/s1600/tcp-ip-with-arp.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="206" src="http://1.bp.blogspot.com/_26KnjtB2MFo/TUU45rxREnI/AAAAAAAAAGY/RqJzG_8Df3Y/s400/tcp-ip-with-arp.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Applications direct TCP/IP packets using IP addresses, which in IPV4 are four byte numbers.&amp;nbsp; The IP destination address is written into the header by the IP layer of the sending host and read by the IP layer on the receiving host.&lt;br /&gt;&lt;br /&gt;However, this is not enough to get packets from one host to another.&amp;nbsp; At the hardware level within a single LAN, network interfaces use &lt;a href="http://en.wikipedia.org/wiki/MAC_address"&gt;MAC addresses&lt;/a&gt; to route messages over physical connections like Ethernet.&amp;nbsp;&amp;nbsp; MAC addresses are 48-bit addresses that are typically burned into the NIC firmware or set in a configuration file if you are running a virtual machine.&amp;nbsp; To forward a packet from host mercury to saturn, the link layer writes in the proper MAC address, in this case 08:00:27:ce:5f:8e.&amp;nbsp; The link layer on host saturn accepts this packet, since it corresponds to its MAC address.&amp;nbsp; It forwards the packet up into the IP layer for acceptance and further processing.&lt;br /&gt;&lt;br /&gt;Yet how does host mercury figure out which MAC address to use for particular IP addresses?&amp;nbsp; This is the job of the ARP cache, which maintains an up-to-date mapping between IP and MAC addresses.&amp;nbsp; You can view the ARP cache contents using the &lt;b&gt;arp&lt;/b&gt; command, as shown in the following example.&lt;br /&gt;&lt;pre&gt;&lt;code&gt;mercury# arp -an&lt;br /&gt;? (192.168.128.41) at 00:25:00:44:f3:ce [ether] on eth0&lt;br /&gt;? (192.168.128.1) at 00:0f:cc:74:64:5c [ether] on eth0&lt;/code&gt;&lt;/pre&gt;Note that the ARP cache does not have a mapping for the VIP address 192.168.128.130.&amp;nbsp; Let's say we now make a client connection to the MySQL server at the other end of the VIP address on mercury:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;# mysql -utungsten -psecret -h192.168.128.130&lt;br /&gt;Welcome to the MySQL monitor.&amp;nbsp; Commands end with ; or \g.&lt;br /&gt;Your MySQL connection id is 33826&lt;br /&gt;...&lt;br /&gt;mysql&amp;gt;&lt;/code&gt;&lt;/pre&gt;To route traffic, host mercury gets the IP address to MAC address mapping using an ARP request.&amp;nbsp; You can watch this happen in real time using the &lt;b&gt;tcpdump&lt;/b&gt; command to track ARP traffic.&lt;br /&gt;&lt;pre&gt;&lt;code&gt;mercury# tcpdump -n -i eth0 arp &lt;br /&gt;tcpdump: verbose output suppressed, use -v or -vv for full protocol decode&lt;br /&gt;listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes&lt;br /&gt;09:37:43.755081 ARP, Request who-has 192.168.128.130 tell 192.168.128.110, length 28&lt;br /&gt;09:37:43.755360 ARP, Reply 192.168.128.130 is-at 08:00:27:ce:5f:8e, length 46&lt;/code&gt;&lt;/pre&gt;Now if you look at the ARP cache again on host mercury, you will see a proper mapping for the VIP in mercury's ARP cache:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;# arp -an&lt;br /&gt;? (192.168.128.130) at 08:00:27:ce:5f:8e [ether] on eth0&lt;br /&gt;? (192.168.128.41) at 00:25:00:44:f3:ce [ether] on eth0&lt;br /&gt;? (192.168.128.1) at 00:0f:cc:74:64:5c [ether] on eth0&lt;/code&gt;&lt;/pre&gt;Now if you go back and look at the picture (or still recall the details), you will notice that the MAC address maps to the NIC on host saturn.&amp;nbsp; This is exactly what we expect since saturn is listening on the corresponding virtual IP address 192.168.128.130.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;&lt;b&gt;Virtual IP Addresses and Split-Brain&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Most real problems with VIPs appear when you try to move them.&amp;nbsp; The reason is simple:&amp;nbsp; TCP/IP does not stop you from having multiple hosts listening on the same virtual IP address.&amp;nbsp; For instance, let's say you issue the following command on host neptune without first dropping the virtual IP address on saturn. &amp;nbsp; &lt;br /&gt;&lt;pre&gt;&lt;code&gt;neptune# ifconfig eth0:0 192.168.128.130 up&lt;/code&gt;&lt;/pre&gt;Let's further clear the ARP mapping for the virtual IP on mercury using the handy arp -d command and reconnect to MySQL.&lt;br /&gt;&lt;pre&gt;&lt;code&gt;mercury # arp -d 192.168.128.130&lt;br /&gt;root@logos1:~# mysql -utungsten -psecret -h192.168.128.130&lt;br /&gt;Welcome to the MySQL monitor.&amp;nbsp; Commands end with ; or \g.&lt;br /&gt;Your MySQL connection id is 294&lt;br /&gt;...&lt;br /&gt;mysql&amp;gt;&lt;/code&gt;&lt;/pre&gt;So far so good--we logged into MySQL and everything appears normal.&amp;nbsp; But in fact it is not normal at all.&amp;nbsp; If you run tcpdump and watch the ARP requests during this login, you see the following:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;# tcpdump -n -i eth0 arp &lt;br /&gt;tcpdump: verbose output suppressed, use -v or -vv for full protocol decode&lt;br /&gt;listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes&lt;br /&gt;09:59:32.643518 ARP, Request who-has 192.168.128.130 tell 192.168.128.110, length 28&lt;br /&gt;09:59:32.643768 ARP, Reply 192.168.128.130 is-at 08:00:27:68:cd:7d, length 46&lt;br /&gt;09:59:32.643793 ARP, Reply 192.168.128.130 is-at 08:00:27:ce:5f:8e, length 46 &lt;/code&gt;&lt;/pre&gt;This is not not just bad--it's &lt;i&gt;&lt;u&gt;very bad&lt;/u&gt;&lt;/i&gt;.&amp;nbsp; Both saturn and neptune responded to mercury's ARP request!&amp;nbsp; Mercury can pick only one for the mapping; which one it picks depends on timing as well as the exact implementation of the TCP/IP stack.&amp;nbsp; In other words we have a race condition and the winner is essentially random.&lt;br /&gt;&lt;br /&gt;You can demonstrate the randomness for yourself with a simple experiment.&amp;nbsp; Let's create a test script named mysql-arp-flush.sh, which clears the ARP cache entry for the VIP and then connects to MySQL, all in a loop. &amp;nbsp; &lt;br /&gt;&lt;pre&gt;#!/bin/bash&lt;br /&gt;for i in {1..5}; &lt;br /&gt;do &lt;br /&gt;  arp -d 192.168.128.130&lt;br /&gt;  sleep 1&lt;br /&gt;  mysql -utungsten -psecret -h192.168.128.130 -N \&lt;br /&gt;    -e "show variables like 'host%'"&lt;br /&gt;done&lt;/pre&gt;If you run the script you'll see results like the following.&amp;nbsp; Note the random switch to Neptune on the fourth connection. &amp;nbsp; &lt;br /&gt;&lt;pre&gt;# ./mysql-arp-flush.sh&lt;br /&gt;+----------+---------+&lt;br /&gt;| hostname | saturn  | &lt;br /&gt;+----------+---------+&lt;br /&gt;+----------+---------+&lt;br /&gt;| hostname | saturn  | &lt;br /&gt;+----------+---------+&lt;br /&gt;+----------+---------+&lt;br /&gt;| hostname | saturn  | &lt;br /&gt;+----------+---------+&lt;br /&gt;+----------+---------+&lt;br /&gt;| hostname | neptune |&lt;br /&gt;+----------+---------+&lt;br /&gt;+----------+---------+&lt;br /&gt;| hostname | saturn  | &lt;br /&gt;+----------+---------+&lt;/pre&gt;At this point you have successfully created a split-brain.&amp;nbsp; If you use database replication and both databases are open for writes, as would be the default case with MySQL replication, Tungsten, or any of the PostgreSQL replication solutions like Londiste, your applications will randomly connect to each DBMS server.&amp;nbsp; Your data will quickly become irreparably mixed up.&amp;nbsp; All you can do is hope that the problem will be discovered quickly.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;&lt;b&gt;A Half-Hearted Solution using Gratuitous ARP&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;You might think that it would be handy if the ARP protocol provided a way to get around split-brain problems by invalidating client host ARP caches.&amp;nbsp; In fact, there is such a feature in ARP--it's called gratuitous ARP.&amp;nbsp; While useful, it is not a solution for split-brain issues.&amp;nbsp; Let's look closely to see why.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;Gratuitous ARP works by sending an unsolicited ARP response to let hosts on the LAN know that an IP address mapping has changed.&amp;nbsp; On Linux systems you can issue the arping command as shown below to generate a gratuitous ARP response.&amp;nbsp; &lt;br /&gt;&lt;pre&gt;&lt;code&gt;neptune# arping -q -c 3 -A -I eth0 192.168.128.130&lt;/code&gt;&lt;/pre&gt;This tells host neptune to send 3 ARP reply messages with its MAC address for the VIP address.&amp;nbsp; If we look at tcpdump output again, we see the following:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;# tcpdump -n -i eth0 arp &lt;br /&gt;tcpdump: verbose output suppressed, use -v or -vv for full protocol decode&lt;br /&gt;listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes&lt;br /&gt;11:02:27.154279 ARP, Reply 192.168.128.130 is-at 08:00:27:68:cd:7d, length 46&lt;br /&gt;11:02:28.159291 ARP, Reply 192.168.128.130 is-at 08:00:27:68:cd:7d, length 46&lt;br /&gt;11:02:29.162403 ARP, Reply 192.168.128.130 is-at 08:00:27:68:cd:7d, length 46&lt;/code&gt;&lt;/pre&gt;Linux hosts that receive the response will generally then update their ARP caches, though as we will see, there are some important exceptions.&amp;nbsp;&amp;nbsp; But first, we need to show the effect of gratuitous ARP on MySQL connections.&amp;nbsp; Let's start with the following ARP cache contents on host mercury.&amp;nbsp; This shows an existing mapping to the MAC address of host neptune, which is what we expect from the previous arping command on neptune.&amp;nbsp; &lt;br /&gt;&lt;pre&gt;&lt;code&gt;mercury# arp -an&lt;br /&gt;? (192.168.128.130) at 08:00:27:68:cd:7d [ether] on eth0&lt;br /&gt;? (192.168.128.41) at 00:25:00:44:f3:ce [ether] on eth0&lt;br /&gt;? (192.168.128.1) at 00:0f:cc:74:64:5c [ether] on eth0&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;Next, we run a loop that connects to MySQL and prints the host name every second.&amp;nbsp; The loop code is shown below and stored in a script named mysql-no-arp-flush.sh.&amp;nbsp; Unlike the previous script this &lt;i&gt;does not&lt;/i&gt; release the ARP cache mapping between connections to MySQL. &lt;br /&gt;&lt;pre&gt;#!/bin/bash&lt;br /&gt;for i in {1..30}; &lt;br /&gt;do &lt;br /&gt;  sleep 1&lt;br /&gt;  mysql -utungsten -psecret -h192.168.128.130 -N \&lt;br /&gt;    -e "show variables like 'host%'"&lt;br /&gt;done&lt;/pre&gt;While the test script is running is running, we run an arping command from saturn.&amp;nbsp; &lt;br /&gt;&lt;code&gt;saturn# arping -q -c 3 -A -I eth0 192.168.128.130&lt;/code&gt;&lt;br /&gt;What we see in the MySQL output is the following.&amp;nbsp; Once the gratuitous ARP is received, mercury switches its connection from neptune to saturn and stays there, at least for the time being. &lt;br /&gt;&lt;pre&gt;&lt;code&gt;mercury# ./mysql-no-arp-flush.sh&lt;br /&gt;+----------+---------+&lt;br /&gt;| hostname | neptune | &lt;br /&gt;+----------+---------+&lt;br /&gt;+----------+---------+&lt;br /&gt;| hostname | neptune | &lt;br /&gt;+----------+---------+&lt;br /&gt;+----------+---------+&lt;br /&gt;| hostname | saturn  | &lt;br /&gt;+----------+---------+&lt;br /&gt;+----------+---------+&lt;br /&gt;| hostname | saturn  | &lt;br /&gt;+----------+---------+&lt;/code&gt;&lt;/pre&gt;There is one more interesting property of gratuitous ARP responses.&amp;nbsp; If you issue one during a session, it will cause client sessions to switch between hosts without waiting for a timeout.&amp;nbsp; Here's an example.&amp;nbsp; First login with MySQL and see which host we are on.&lt;br /&gt;&lt;pre&gt;&lt;code&gt;root@logos1:~# mysql -utungsten -psecret -h192.168.128.130&lt;br /&gt;Welcome to the MySQL monitor.  Commands end with ; or \g.&lt;br /&gt;Your MySQL connection id is 33853&lt;br /&gt;mysql&amp;gt; show variables like 'hostname';&lt;br /&gt;+---------------+---------+&lt;br /&gt;| Variable_name | Value   |&lt;br /&gt;+---------------+---------+&lt;br /&gt;| hostname      | neptune | &lt;br /&gt;+---------------+---------+&lt;br /&gt;1 row in set (0.00 sec)&lt;/code&gt;&lt;/pre&gt;Now issue an arping command on saturn using a separate window.&lt;br /&gt;&lt;code&gt;saturn# arping -q -c 3 -A -I eth0 192.168.128.130&lt;/code&gt;&lt;br /&gt;Finally, go back and check the host name again in the MySQL session.&amp;nbsp;&amp;nbsp; The session switches over to the other server, which you see at the client level as a lost connection followed by a reconnect.&lt;br /&gt;&lt;pre&gt;&lt;code&gt;mysql&amp;gt; show variables like 'hostname'; &lt;br /&gt;ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... Connection id:   33854 &lt;br /&gt;Current database: *** NONE ***  &lt;br /&gt;+---------------+--------+ &lt;br /&gt;| Variable_name | Value  | &lt;br /&gt;+---------------+--------+ &lt;br /&gt;| hostname      | saturn |  &lt;br /&gt;+---------------+--------+ &lt;br /&gt;1 row in set (0.00 sec)&lt;/code&gt;&lt;/pre&gt;So is gratuitous ARP the solution to virtual IP split-brain?&amp;nbsp; It announces that there is a mapping change, which can make failover work much more quickly.&amp;nbsp; This is useful in its own right.&amp;nbsp; However, it does not solve split-brain.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;First, not all TCP/IP stacks even recognize gratuitous ARP responses.&amp;nbsp; Second, gratuitous ARP only takes effect on hosts that actually have a current mapping in their ARP cache.&amp;nbsp; Other hosts will&amp;nbsp; wait until they actually need the mapping and then issue a new ARP request.&amp;nbsp; Finally, ARP mappings automatically time out after a few minutes. &amp;nbsp; In that case the host will issue a new ARP request, which as in the two preceding cases brings us right back to the split-brain scenario we were trying to cure.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Avoiding Virtual IP Split -Brains&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Avoiding a VIP-induced split-brain not a simple problem.&amp;nbsp; The best approach is combination of sound cluster management, amelioration, and paranoia.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;Proper cluster management is the first line of defense.&amp;nbsp; VIPs  are an example of a unique resource in the system that only one host may  hold at a time.&amp;nbsp;&amp;nbsp; An old saying that has been attributed to everyone  from Genghis Khan to Larry Ellison sums up the problem succinctly:&lt;br /&gt;&lt;blockquote&gt;&lt;i&gt;It is not enough to succeed.&amp;nbsp; All others must fail.&amp;nbsp;&amp;nbsp;&lt;/i&gt;&lt;/blockquote&gt;The standard technique to implement this policy is called &lt;a href="http://linux-ha.org/wiki/STONITH"&gt;STONITH&lt;/a&gt;,  which stands for "Shoot the other node in the head."&amp;nbsp; Basically it  means that before one node acquires the virtual IP address the cluster manager must make every  effort to ensure no other node has it, using violent means if  necessary.&amp;nbsp;&amp;nbsp; Moving the VIP thus has the following steps.&lt;br /&gt;&lt;ol&gt;&lt;li&gt;The  cluster manager executes a procedure to drop the VIP on all other hosts  (for example using ssh or by cutting off power).&amp;nbsp; Once these procedures  are complete, the cluster manager executes a command to assign the VIP  to the new owner.&amp;nbsp; &lt;/li&gt;&lt;li&gt;Isolated nodes automatically release their VIP.&amp;nbsp; "Isolated" is  usually defined as being cut off from the cluster manager and unable to  access certain agreed-upon network resources such as routers or public  DNS servers.&amp;nbsp; &lt;/li&gt;&lt;li&gt;In cases of doubt, everybody stops.&amp;nbsp; For most systems it is far better to be unavailable than to mix up data randomly. &amp;nbsp; &lt;/li&gt;&lt;/ol&gt;Cluster managers like &lt;a href="http://www.continuent.com/downloads/software"&gt;Tungsten&lt;/a&gt; and &lt;a href="http://linux-ha.org/wiki/Main_Page"&gt;Pacemaker&lt;/a&gt;  handle this kind of process very well. &amp;nbsp; PaceMaker for example has a  number of specialized hooks to cut power or otherwise use extreme  violence to ensure proper fencing of databases.&amp;nbsp; Tungsten has fewer such  hooks but has a much richer set of operations for databases and also has a  wide set of connectivity options for HA besides using VIPs. &lt;br /&gt;&lt;br /&gt;Incidentally,  you want to be very wary about re-inventing the wheel, especially when  it comes to DBMS clustering and high availability.&amp;nbsp; Clustering has a lot  of non-obvious corner cases; even the "easy" problems like planned  failover are quite hard to implement correctly.&amp;nbsp; You are almost always  better off using something that already exists instead of trying to roll  your own solution.&lt;br /&gt;&lt;br /&gt;Amelioration is the next line of defense, namely to make split-brain situations less dangerous when they actually occur.&amp;nbsp; Failover using shared disks or non-writable slaves (e.g., with DRBD or PostgreSQL streaming replication) have a degree of protection because it is somewhat harder to have multiple databases open for writes.&amp;nbsp; However, it is definitely possible and the cluster manager is your best bet to prevent this.&amp;nbsp; However, when using MySQL with either native or Tungsten replication, databases are open and therefore susceptible to data corruption, unless you ensure slaves are not writable.&lt;br /&gt;&lt;br /&gt;Fortunately, this is very easy to do in MySQL.&amp;nbsp;&amp;nbsp; To make a database readonly to all accounts other than those with SUPER privilege, just issue the following commands to make the server readonly and ensure the setting is in effect:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;neptune# mysql -uroot -e "set global read_only=1"&lt;br /&gt;neptune# mysql -uroot -e "show variables like 'read_only'"&lt;br /&gt;+---------------+-------+&lt;br /&gt;| Variable_name | Value |&lt;br /&gt;+---------------+-------+&lt;br /&gt;| read_only     | ON    | &lt;br /&gt;+---------------+-------+&lt;/code&gt;&lt;/pre&gt;This protects you not just in cases of actual failover but also from administrative mistakes or software failures that switch the VIP by accident.&amp;nbsp;&amp;nbsp; Many cluster managers implement read-only slaves.&amp;nbsp;&amp;nbsp; &lt;a href="http://www.continuent.com/downloads/software/register/login"&gt;Tungsten clustering&lt;/a&gt; has explicit support for read-only slaves.&amp;nbsp; Other cluster managers like &lt;a href="http://mysql-mmm.org/"&gt;MMM&lt;/a&gt; and Pacemaker can do the same. &lt;br /&gt;&lt;br /&gt;Lastly paranoia is always a good thing.&amp;nbsp; You should test the daylights out of clusters that depend on VIPs before deployment, and also check regularly afterwards to ensure there are no unexpected writes on slaves.&amp;nbsp; Regular checks of logs are a good idea.&amp;nbsp; Another good way to check for problems in MySQL master/slave setups is to run consistency checks.&amp;nbsp;&amp;nbsp; Tungsten Replicator has built-in consistency checking designed for exactly this purpose.&amp;nbsp; You can also run &lt;a href="http://www.maatkit.org/doc/mk-table-checksum.html"&gt;Maatkit mk-table-checksum&lt;/a&gt; at regular intervals.&amp;nbsp;&amp;nbsp; Another best practice is to "flip" masters and slaves on a regular basis to ensure switch and failover procedures work properly.&amp;nbsp;&amp;nbsp; Don't avoid trouble--look for it!&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Conclusion and Note on Sources&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Virtual IP addresses are a convenient way to set up database high availability but can lead to very severe split-brain situations if used incorrectly.&amp;nbsp;&amp;nbsp; To deploy virtual IP addresses without problems you must first of all understand how they work and second use a sound cluster management approach that avoids split-brain and minimizes the impact if it does occur.&amp;nbsp; As with all problems of this kind you need to test the implementation thoroughly before deployment as well as regularly during operations.&amp;nbsp; This will help avoid nasty surprises and corrupt data that are otherwise all but inevitable.&lt;br /&gt;&lt;br /&gt;Finally it is worth talking a bit about sources.&amp;nbsp; I wrote this article because I could not find a single location that explained virtual IP addresses in a way that drew out the consequences of their behavior for database failover.&amp;nbsp; That said, there are a couple of good general sources for information on Internet tools and high availability:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://linux-ip.net/"&gt;http://linux-ip.net/&lt;/a&gt; -- Guide to IP Layer Network Administration with Linux.&amp;nbsp;&lt;/li&gt;&lt;li&gt;&lt;a href="http://linux-ha.org/"&gt;http://linux-ha.org/&lt;/a&gt; -- Linux HA wiki pages&lt;/li&gt;&lt;/ul&gt;Beyond that you can look at general networking sources like Radia Perlman's &lt;i&gt;Interconnections, Second Edition&lt;/i&gt; or &lt;i&gt;Internetworking with TCP/IP&lt;/i&gt; by Douglas Comer.&amp;nbsp; These are more high-level.&amp;nbsp; If you get really desperate for details, try the RFCs, for example &lt;a href="http://tools.ietf.org/html/rfc826"&gt;RFC-826&lt;/a&gt;, which is the original specification for ARP.&amp;nbsp; Some of them are surprisingly good reads even 30 years after the fact.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-6791807547638633260?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/6791807547638633260/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=6791807547638633260' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/6791807547638633260'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/6791807547638633260'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/01/virtual-ip-addresses-and-their.html' title='Virtual IP Addresses and Their Discontents for Database Availability'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_26KnjtB2MFo/TUU44yNMVXI/AAAAAAAAAGU/gs_WVUWxTEg/s72-c/vip-based-failover.jpg' height='72' width='72'/><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-6146505396844945146</id><published>2011-01-25T23:14:00.000-08:00</published><updated>2011-01-25T23:14:00.276-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='MariaDB'/><category scheme='http://www.blogger.com/atom/ns#' term='Drizzle'/><title type='text'>Tungsten Replicator Overview Webinar</title><content type='html'>On Thursday January 27th at 10am PST I will doing a webinar on &lt;a href="http://www.continuent.com/images/stories/pdfs/tungsten-replicator-guide.pdf"&gt;Tungsten Replicator&lt;/a&gt; together with my colleague &lt;a href="http://datacharmer.org/"&gt;Giuseppe Maxia&lt;/a&gt;.&amp;nbsp; The title is "What MySQL Replication Cannot Do.&amp;nbsp; And How to Get Around It."&amp;nbsp; Basically it is a nuts and bolts description of Tungsten Replicator capabilities like multi-master replication, failover, parallel apply, and using replication for zero-downtime upgrade.&amp;nbsp; If you have ever wanted an in-depth look at the Tungsten Replicator this is a good opportunity.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;During 2010 we implemented an amazing number of new replication features ranging from pipelines early in the year to fast disk logs, multiple replication services per process, bi-directional replication, and parallel apply by the end.&amp;nbsp; We will be building out all of these in the coming year and releasing increasingly capable features into open source as well.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;This presentation is part of Continuent's regular webinar series which means we will also talk a bit about commercial products and services at the end.&amp;nbsp; However, it's mostly cool replication stuff.&amp;nbsp;&amp;nbsp; You can sign up on the &lt;a href="http://www.continuent.com/news/webinars"&gt;Continuent webinar page&lt;/a&gt;.&amp;nbsp; Hope to see you there.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-6146505396844945146?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/6146505396844945146/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=6146505396844945146' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/6146505396844945146'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/6146505396844945146'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/01/tungsten-replicator-overview-webinar.html' title='Tungsten Replicator Overview Webinar'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-3284087075841814243</id><published>2011-01-11T22:30:00.000-08:00</published><updated>2011-01-21T14:54:07.799-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='MariaDB'/><title type='text'>Fixing Replication with Replication</title><content type='html'>A couple of days ago I ran into a Tungsten Replicator case where several MySQL tables became corrupted on slaves and needed to be restored from the master.&amp;nbsp;&amp;nbsp; We identified the tables that had problems fairly quickly using Tungsten Replicator's consistency checks.&amp;nbsp; However, that led to another problem:&amp;nbsp; how to restore the slave tables efficiently from the master.&amp;nbsp; The MySQL server in question processes around 10M tranactions per day--there is virtually no downtime.&amp;nbsp; Though the tables were not large, we could not be sure whether they were in use.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;Fortunately, you can use a simple MySQL trick to get all the rows of a table to replicate through to slaves.&amp;nbsp; The idea is to dump the table, delete the rows, then reload it again.&amp;nbsp; The delete and subsequent reload replicate out to slaves, after which everything is consistent again.&amp;nbsp; Let's say we have a table called tpcb.history that needs to be fixed.&amp;nbsp; Login with mysql and run the following commands:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;BEGIN;&lt;br /&gt;SELECT * FROM tpcb.history &lt;br /&gt;  INTO OUTFILE '/tmp/tpcb.history.dmp' FOR UPDATE;&lt;br /&gt;DELETE FROM tpcb.history;&lt;br /&gt;LOAD DATA INFILE '/tmp/tpcb.history.dmp' REPLACE&lt;br /&gt;  INTO TABLE tpcb.history FIELDS TERMINATED BY '\t'&lt;br /&gt;  LINES TERMINATED BY '\n';&lt;br /&gt;COMMIT;&lt;/code&gt;&lt;/pre&gt;You can do the reload several ways in MySQL, but this particular code has some advantages over other approaches, such as using LOCK TABLES.&amp;nbsp; First, it uses a transaction, so if something goes wrong the changes roll back and you do not lose your data.&amp;nbsp; Second, the SELECT ... FOR UPDATE locks your data and ensures serialization.&amp;nbsp; You can run this while applications are running without problems. &lt;br /&gt;&lt;br /&gt;This seems useful enough that I put together a simple script called &lt;a href="http://tungsten.svn.sourceforge.net/viewvc/tungsten/trunk/replicator/samples/scripts/refresh/reload-table.sh?revision=2536&amp;amp;content-type=text%2Fplain"&gt;reload-table.sh&lt;/a&gt; with a &lt;a href="http://tungsten.svn.sourceforge.net/viewvc/tungsten/trunk/replicator/samples/scripts/refresh/README?revision=2536&amp;amp;view=markup"&gt;README&lt;/a&gt; and checked them into the Tungsten Replicator codeline on SourgeForge.net.&amp;nbsp; You can refresh the same table shown above using the following command:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;[sudo] ./reload-table-b.sh -u tungsten -p secret -t tpcb.history&lt;/code&gt;&lt;/pre&gt;I tested the reload using Tungsten 1.3.1 on MySQL 5.1 with statement  replication.&amp;nbsp; However, it would work equally well with row replication.&amp;nbsp;  Moreover, you can do the same trick in MySQL replication, as this  involves base replication capabilities that are directly equivalent.&amp;nbsp; There are a few caveats:&amp;nbsp; you need to use InnoDB (or another transactional engine), large tables may be a problem, and you would need to tread carefully in cases where tables contain referential constraints.&amp;nbsp; Finally, it would be wise to save the master table somewhere else before running the script.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-3284087075841814243?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/3284087075841814243/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=3284087075841814243' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/3284087075841814243'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/3284087075841814243'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2011/01/fixing-replication-with-replication.html' title='Fixing Replication with Replication'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-1895306719036544861</id><published>2010-12-12T21:52:00.000-08:00</published><updated>2010-12-12T21:52:00.922-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='MariaDB'/><category scheme='http://www.blogger.com/atom/ns#' term='PostgreSQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Drizzle'/><title type='text'>Interested in Sponsoring Tungsten Open Source Features?</title><content type='html'>Over the last few months I have been pleasantly surprised by the number of people using open source builds of Tungsten.&amp;nbsp; My company, Continuent, has therefore started to offer support for open source users and will likely expand these services to meet demand. &lt;br /&gt;&lt;br /&gt;There have also been a number of requests to add specific features to open source builds, especially for replication. We have added a few already but are now considering pushing even more features into open source if we can find sponsors.&amp;nbsp; These add to a number of great features already in open source like global transaction IDs, MySQL 5.0/5.1, basic drizzle replication, transaction filtering, and many others.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;Do you have special replication or clustering features you would like to see added to Tungsten? Specialized MySQL to PostgreSQL replication?&amp;nbsp; Management and monitoring commands?&amp;nbsp; Cool parallel replication problems?&amp;nbsp; High-performance logging?&amp;nbsp; Weird multi-master topologies?&amp;nbsp; Talk to us about sponsoring new open source features.&amp;nbsp; We're happy to do projects that solve interesting problems, benefit the open source databases community, and help grow Tungsten as a product.&amp;nbsp;&lt;br /&gt;&lt;br /&gt;Visit the Continuent website or send email directly to robert dot hodges at continuent dot com. &amp;nbsp;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-1895306719036544861?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/1895306719036544861/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=1895306719036544861' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/1895306719036544861'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/1895306719036544861'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2010/12/interested-in-sponsoring-tungsten-open.html' title='Interested in Sponsoring Tungsten Open Source Features?'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-202789508588586724</id><published>2010-11-07T09:15:00.000-08:00</published><updated>2010-11-07T09:15:00.266-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='MariaDB'/><category scheme='http://www.blogger.com/atom/ns#' term='PostgreSQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Drizzle'/><category scheme='http://www.blogger.com/atom/ns#' term='IT Industry'/><title type='text'>It's All about the Team</title><content type='html'>Earlier this week Giuseppe Maxia &lt;a href="http://datacharmer.blogspot.com/2010/11/qa-at-continuent-serendipitous-job.html"&gt;blogged about joining Continuent as Director of QA&lt;/a&gt;.&amp;nbsp; Creating high quality systems for distributed data management is a hard but fascinating problem.&amp;nbsp; I have been hooked on it myself for many years.&amp;nbsp; Guiseppe brings the creativity as well as humor our team needs to nail this problem completely.&amp;nbsp; I'm therefore delighted to know he will be focused on it. &lt;br /&gt;&lt;br /&gt;That said, I'm even happier for another reason.&amp;nbsp; Beyond solving any single problem, Giuseppe strengthens an already strong team.&amp;nbsp; Ed Catmull of Pixar gave a great speech a few years ago about &lt;a href="http://www.youtube.com/watch?v=k2h2lvhzMDc"&gt;managing creative teams and why successful companies eventually fail&lt;/a&gt;.&amp;nbsp; Among other things he asked the question whether it is the idea or the people who implement it that count most.&amp;nbsp; His conclusion:&amp;nbsp; great teams implement good ideas to build great products.&amp;nbsp; But even more important, great teams can turn bad ideas into good ones, then go on to build great products from those ideas too.&amp;nbsp; Pixar has proved this many times over. &lt;br /&gt;&lt;br /&gt;I believe strongly in the power of great teams to create great products.&amp;nbsp; Giuseppe, welcome to the team.&amp;nbsp;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-202789508588586724?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/202789508588586724/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=202789508588586724' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/202789508588586724'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/202789508588586724'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2010/11/its-all-about-team.html' title='It&apos;s All about the Team'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-1429183395647695986</id><published>2010-10-24T11:01:00.000-07:00</published><updated>2010-10-24T16:18:57.042-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='MariaDB'/><category scheme='http://www.blogger.com/atom/ns#' term='SaaS'/><category scheme='http://www.blogger.com/atom/ns#' term='Drizzle'/><title type='text'>Parallel Replication on MySQL:  Report from the Trenches</title><content type='html'>Single-threaded apply is one of the big downsides of MySQL's built-in replication, as &lt;a href="http://www.mysqlperformanceblog.com/2010/10/20/mysql-limitations-part-1-single-threaded-replication/"&gt;Baron Schwartz pointed out&lt;/a&gt; a couple of days ago.&amp;nbsp; While a master can process dozens of updates at once, slaves must apply them one after the other on a single thread.&amp;nbsp; Add in disk I/O, and the result is very slow performance indeed.&amp;nbsp; The obvious answer is &lt;i&gt;parallel apply&lt;/i&gt;, namely writing multiple non-conflicting updates to the slave at once.&lt;br /&gt;&lt;br /&gt;I have spent the last few months implementing parallel apply for Tungsten 2.0, which we are now testing at customer sites.&amp;nbsp; In this article I would like to describe how Tungsten's parallel apply works as well as some of the lessons that have become apparent through the implementation.&lt;br /&gt;&lt;br /&gt;There are a couple of big challenges in parallel apply.&amp;nbsp; There is of course the practical problem of separating transactions into parallel streams, for example splitting them by database.&amp;nbsp; This is known as &lt;i&gt;sharding&lt;/i&gt;. &amp;nbsp; Row updates are easy enough but MySQL also has statement replication.&amp;nbsp; Transactions with statements require parsing, and there are ambiguous cases.&amp;nbsp; If that's not enough, features like LOAD DATA INFILE have a complex implementation in the binlog and require specialized logic to shard correctly.&amp;nbsp; In addition, parallel apply of any kind has a lot of corner cases that you have to solve completely or risk unpredictable failures.&amp;nbsp; Here's an example:&amp;nbsp; skipping transactions on the slave.&amp;nbsp; You have to wait for the event, but what if some of the threads are already past it when you ask to skip?&amp;nbsp; How do you synchronize access to the list of transactions to skip without creating a choke point for threads?&amp;nbsp;&amp;nbsp; &lt;br /&gt;&lt;br /&gt;The next challenge is performance.&amp;nbsp; Parallel replication offers a rich choice of ways to lower throughput, not raise it.&amp;nbsp; Multiple disk logs are the best I have found so far, as they can convert sequential reads and writes on the disk log to random I/O when more replication threads contend for different parts of the disk.&amp;nbsp; Implementing multiple queues in memory is far faster and simpler but limits the queue sizes.&amp;nbsp; Another excellent way to slow things down is to try to parallelize SQL transactions with a lot of dependencies, which means you end up effectively serialized *and* paying the extra cost of parsing transactions and synchronizing threads.&amp;nbsp; In this case it can be better to keep everything sequential but use block commit to apply 50 or 100 transactions simultaneously on the slave. &lt;br /&gt;&lt;br /&gt;With all that said, the parallel apply problem is still quite tractable, but you need to pick your battles carefully.&amp;nbsp; Tungsten's parallel apply implementation has a very clear problem focus:&amp;nbsp; speeding up slave updates for multi-tenant applications that have a high degree of natural partitioning and concurrent updates across customers.&amp;nbsp; This is not as limiting as it might sound to readers unfamiliar with MySQL.&amp;nbsp; SaaS applications for the most part follow the multi-tenant model on MySQL, with each customer assigned to a particular database.&amp;nbsp; So do large ISPs or cloud providers that host customers on shared servers using separate databases. &lt;br /&gt;&lt;br /&gt;Tungsten parallel apply is based on automatic sharding of transactions.&amp;nbsp;&amp;nbsp; The following diagram shows the parallel apply algorithm conceptually.&amp;nbsp; &lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;img border="0" height="191" src="http://4.bp.blogspot.com/_26KnjtB2MFo/TMMLmwZgFqI/AAAAAAAAAGA/iiTwBYg9chY/s400/Parallel-Apply-Algorithm.png" style="margin-left: auto; margin-right: auto;" width="400" /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Tungsten Parallel Apply&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_26KnjtB2MFo/TMMLmwZgFqI/AAAAAAAAAGA/iiTwBYg9chY/s1600/Parallel-Apply-Algorithm.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;/a&gt;&lt;/div&gt;Tungsten has a flexible architecture based on &lt;i&gt;replication pipelines&lt;/i&gt;, described in a &lt;a href="http://scale-out-blog.blogspot.com/2010/04/customized-data-movement-with-tungsten.html"&gt;previous article on this blog&lt;/a&gt;.&amp;nbsp; To recap the model, pipelines are divided into stages, which represent processing steps.&amp;nbsp; Each stage consists of an extract-filter-apply loop with symmetric interfaces and identical processing logic for each stage.&amp;nbsp; The parallel apply implementation builds on replication pipelines as follows:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;A new filter called EventMetadataFilter automatically parses incoming transactions to figure out which database(s) they affect.&amp;nbsp; This is simple for row updates but involves parsing for statements and specialized extract handling for odd-ball operations like LOAD DATA INFILE.&amp;nbsp; &lt;/li&gt;&lt;li&gt;The shard ID is assigned from the database name. This is glommed into the EventMetadataFilter but will shortly be broken out into a separate filter so that it is possible to support alternate shard assignment algorithms.&amp;nbsp; &lt;/li&gt;&lt;li&gt;There is a new kind of in-memory buffer between stages called a ParallelQueue that supports multiple queues that feed the final apply stage.&amp;nbsp;&amp;nbsp; Stages have a corresponding extension to allow them to have multiple threads, which must match the number of parallel queues or you get an error.&amp;nbsp; &lt;/li&gt;&lt;li&gt;The ParallelQueue implementation calls a new component called a Partitioner to assign transactions a partition number (i.e., a parallel queue).&amp;nbsp; You can substitute different algorithms by providing different partitioner implementations.&amp;nbsp; The default implementation uses a configuration file called shard.list to map shards to queues.&amp;nbsp; Unless you say otherwise it hashes on the shard ID to make this assignment. &lt;/li&gt;&lt;/ol&gt;Extensions #1 and #2 run on the master, while #3 and #4 run on the slave.&amp;nbsp; I really like diagrams, so here is a picture of the fully implemented parallel apply architecture.&amp;nbsp; The master replicator extracts, assigns the shard, and logs each transaction.&amp;nbsp; The slave replicator fetches transactions, logs them locally, then applies in parallel. &lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;img border="0" height="193" src="http://3.bp.blogspot.com/_26KnjtB2MFo/TMPIkodbz8I/AAAAAAAAAGE/nKGG7k_5Tvk/s640/Parallel-Master-Slave-Arch.png" style="margin-left: auto; margin-right: auto;" width="640" /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Full Master/Slave Architecture for Parallel Apply&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_26KnjtB2MFo/TMPIkodbz8I/AAAAAAAAAGE/nKGG7k_5Tvk/s1600/Parallel-Master-Slave-Arch.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;/a&gt;&lt;/div&gt;So how does this work?&amp;nbsp; Pretty well actually.&amp;nbsp; Local lab tests indicate that parellel apply roughly doubles throughput on a multi-database TPC-B benchmark we use for testing. &amp;nbsp; We should be able to publish some real-world performance numbers in the near future, but so far things look quite promising.&amp;nbsp; During the implementation a number of interesting issues have arisen, which I would like to discuss now.&lt;br /&gt;&lt;br /&gt;The first issue is the ratio between parallel apply threads and shards.&amp;nbsp; While it might seem obvious to have a thread per shard, in real deployments the situation is not so clear.&amp;nbsp; For one thing actual deployments in SaaS and ISP situations often have hundreds or even thousands of databases, which has a number of practical consequences for implementation.&amp;nbsp; Less obviously, spreading transactions thinly across a lot of queues means fewer opportunities to use block commit, hence more work for slave servers and less overall throughput.&amp;nbsp; Performance optimization is a very uncertain matter, so Tungsten lets users configure the ratio. &lt;br /&gt;&lt;br /&gt;Dependencies between shards are yet another issue.&amp;nbsp; While I mentioned that Tungsten is designed for applications with "a high degree of natural partitioning," dependencies between databases as well as individual transactions do occur and cannot be ignored.&amp;nbsp; For example, many SaaS applications have reference data that are used by all customer databases.&amp;nbsp; Even if parallel SQL works here, applications may get sick from seeing updates appear in the wrong order.&amp;nbsp; Or you could have global operations like CREATE USER that affect all databases.&amp;nbsp; Or you might not be able to tell which shard a piece of SQL belongs to.&amp;nbsp; Tungsten allows users to declare reference databases and automatically serializes these databases as well as global or "don't know" cases.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;There are also numerous issues around startup and shutdown.&amp;nbsp; Remember how MySQL replication slaves will not restart after unclean shutdown with open temp tables?&amp;nbsp; (If not, take a quick break and &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/replication-features-temptables.html"&gt;read this now&lt;/a&gt;.&amp;nbsp; You'll thank me later.)&amp;nbsp; Parallel apply introduces similar issues, because you have multiple threads all updating different positions in the database.&amp;nbsp; Tungsten handles crash recovery by tracking the apply position of each queue in InnoDB and then recommencing from that point on restart in each queue.&amp;nbsp; I am putting finishing touches on clean shutdown, which ensures that all queues are empty, much like automatically checking that temp tables are closed on MySQL.&amp;nbsp;&amp;nbsp; &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;In short, over the last few months Tungsten has climbed a fair distance up a pretty big hill to get parallel apply to work.&amp;nbsp; The flexibility of the replicator architecture, particularly pipelines, has been very helpful as it is quite easy to extend.&amp;nbsp; The parallelization algorithm builds on terrific work by other colleagues at Continuent, especially Stephane Giron and Linas Virbalas.&amp;nbsp; They have both put enormous effort into building up MySQL and PostgreSQL replication capabilities.&lt;br /&gt;&lt;br /&gt;Here are a couple of parting thoughts about parallelization based on the experience so far.&lt;br /&gt;&lt;br /&gt;&lt;u&gt;Thought number one&lt;/u&gt;:&amp;nbsp; parallel replication is not magic.&amp;nbsp; To use parallel apply effectively, applications need to play nice:&amp;nbsp; mostly short transactions and not too many dependencies between shards are the biggest requirements to see a substantial boost in throughput.&amp;nbsp; For example, if you let one user write 50M statements to the binlog in a single transaction, things are going to get kind of quiet on the slave no matter what you do.&amp;nbsp; Also, you can forget about MyISAM or other non-transactional engines.&amp;nbsp; As I have written before, these engines offer a number of opportunities for databases to get messed up or out-of-sync even using conventional MySQL replication.&amp;nbsp; Tungsten's block commit and parallel apply increase the window for problems significantly.&amp;nbsp; If you are still using MyISAM for replicated data, it's time to man up and convert to InnoDB.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;&lt;u&gt;Thought number two&lt;/u&gt;: The long-term answer to effective parallel replication is to change how MySQL works by interleaving transactions within the binlog &lt;a href="http://kristiannielsen.livejournal.com/14305.html"&gt;along the lines suggested by Kristian Nielsen&lt;/a&gt; and others.&amp;nbsp; MySQL currently completely serializes transactions to the binlog, an accomplishment that makes slave apply logic a lot simpler.&amp;nbsp;&amp;nbsp; Tungsten parallel apply then has to undo this good work and recreate streams of non-conflicting updates, which is complex and does not help all workloads.&lt;br /&gt;&lt;br /&gt;It is doubtful that replicating interleaved transactions will be less complex than handling a serial binlog as it stands today.&amp;nbsp; There is also some heavy lifting inside MySQL to get to an interleaved binlog.&amp;nbsp; However, interleaved transactions would have the advantage that transactions for any workload would be parallelized, which would widen the scope of benefits to users.&amp;nbsp; I'm happy to see that Kristian and other people are now working this feature for future releases of MySQL.&lt;br /&gt;&lt;br /&gt;Meanwhile, we have a workable solution for Tungsten and are pushing it forward as quickly as we can.&amp;nbsp; Contact Continuent if you would like to test it out.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-1429183395647695986?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/1429183395647695986/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=1429183395647695986' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/1429183395647695986'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/1429183395647695986'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2010/10/parallel-replication-on-mysql-report.html' title='Parallel Replication on MySQL:  Report from the Trenches'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_26KnjtB2MFo/TMMLmwZgFqI/AAAAAAAAAGA/iiTwBYg9chY/s72-c/Parallel-Apply-Algorithm.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-2314241342605546919</id><published>2010-10-16T11:00:00.000-07:00</published><updated>2010-10-16T14:38:35.675-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='MariaDB'/><category scheme='http://www.blogger.com/atom/ns#' term='SaaS'/><title type='text'>MySQL Disaster Recovery With Tungsten</title><content type='html'>Disaster recovery (DR) is not the first thing most DBAs think of when putting up a new database application.&amp;nbsp;&amp;nbsp; However, it's one of the top issues for people using the data--what happens if the site goes down and everything disappears?&amp;nbsp;&amp;nbsp; So even if DR is not the first issue in every deployment, it is a very high priority as soon as your application is the least bit successful. &lt;br /&gt;&lt;br /&gt;At the database level DR has a fairly simple solution:&amp;nbsp; keep copies of data on a backup site that is up-to-date at all times.&amp;nbsp; This article explains the architecture for MySQL DR with Tungsten and a couple of key features that make it work, namely floating IP addresses and global transation IDs.&amp;nbsp; We will dig into those at the end. &lt;br /&gt;&lt;br /&gt;First a bit of introduction.&amp;nbsp; Tungsten manages clusters of off-the-shelf database connected by master/slave replication.&amp;nbsp; There are replication and management services on each host with automated policies to handle failover as well as low-level tasks like recognizing new cluster members.&amp;nbsp; There is a simple management client that lets you log into any host and manage all nodes in the cluster.&amp;nbsp; Tungsten also has connectivity options to let applications find databases easily.&amp;nbsp; However, for this article we are going to focus on the database only and how you solve the problem of ensuring your data are protected.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;&lt;u&gt;&lt;b&gt;DR Setup&lt;/b&gt;&lt;/u&gt;&lt;br /&gt;&lt;br /&gt;To implement disaster recovery, you actually create two clusters--one on your main site and one on a backup site which we will henceforth call the DR site.&amp;nbsp; It looks like the following picture.&lt;br /&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;img border="0" height="257" src="http://3.bp.blogspot.com/_26KnjtB2MFo/TLnVMfxkWTI/AAAAAAAAAF0/jBLmzXRyk9U/s400/DR-Arch-Stable.png" style="margin-left: auto; margin-right: auto;" width="400" /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Standard Main/DR Architecture with Backups&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_26KnjtB2MFo/TLnVMfxkWTI/AAAAAAAAAF0/jBLmzXRyk9U/s1600/DR-Arch-Stable.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;Here is an outline of the setup.&amp;nbsp; There are additional details of course but those are covered in Tungsten documentation and support procedures.&amp;nbsp; The goal here is to give you a sense of how things work at the top level.&amp;nbsp; &lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;b&gt;Main site&lt;/b&gt;.&amp;nbsp; Set up the main site cluster as a master/slave pair with a floating IP address on the master.&amp;nbsp; Enable &lt;i&gt;automatic&lt;/i&gt; policy mode so that in the event of a master failure the local slave will immediately take over.&amp;nbsp; Set up backups and run them on the slave on a regular basis.&amp;nbsp; &lt;/li&gt;&lt;li&gt;&lt;b&gt;DR site&lt;/b&gt;.&amp;nbsp; Next, set up the DR cluster by provisioning both databases with a recent backup from the main cluster.&amp;nbsp; Configure it identically to the main site with a master IP address and with backups but with two exceptions.&amp;nbsp; First, use &lt;i&gt;manual&lt;/i&gt; policy mode so that the cluster does not try to fail over. Second, do not start replication automatically.&amp;nbsp; Instead, manually configure the DR master to be a slave of the main site master using the master floating IP address and start services.&amp;nbsp; Set up backups on this site as well.&amp;nbsp; &lt;/li&gt;&lt;/ol&gt;&lt;u&gt;&lt;b&gt;Handling Failures &lt;/b&gt;&lt;/u&gt;&lt;br /&gt;&lt;br /&gt;At the end of setup you have a main site with a cluster and a DR site with a cluster that slaves efficiently off the main site master.&amp;nbsp; Both sites have regular backups.&amp;nbsp; As long as there are no failures, you operate both sites and everything is fine.&amp;nbsp; Let us now consider a couple of different types of failures and how to handle them. &lt;br /&gt;&lt;br /&gt;Let's suppose the main site master fails.&amp;nbsp; Tungsten will automatically fail over to the main site slave and&lt;i&gt; &lt;/i&gt;move the master floating IP address.&amp;nbsp; The DR site relay slave TCP/IP connection to the master will then break, or more accurately time out.&amp;nbsp; When the relay slave reconnects to the floating IP,&amp;nbsp; it will have shifted to the new master and replication to the DR site will continue without any human intervention.&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;img border="0" height="257" src="http://4.bp.blogspot.com/_26KnjtB2MFo/TLnXspTlZhI/AAAAAAAAAF8/pZDbvfZh6G4/s400/DR-Arch-Local-Master-Failed.png" style="margin-left: auto; margin-right: auto;" width="400" /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Failed Master on Main Site&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;This protocol is handy because failures are not the only reason that the main site master may move.&amp;nbsp; You can also move masters for maintenance or upgrades.&amp;nbsp; Tungsten has a switch command that makes this very easy to do.&amp;nbsp; The floating IP moves as before and the DR site continues to receive updates properly after it reconnects.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;If you lose the main site, you initiate a site switch procedure.&amp;nbsp; At the database level this consists of running a script to "unconfigure" your DR relay slave node so that it becomes a master again and then reload the configuration.&amp;nbsp; When the node comes up as a master it will then automatically install its own master floating IP address.&amp;nbsp; The commands are simple and run in a few seconds.&amp;nbsp; In most cases it will take a lot longer to switch applications properly than switch databases, because you have to change DNS entries, start and/or reconfigure applications, and potentially activate other resources to have a functioning system.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;In fact, the real problem with site failover at the database level is not so much failing over but getting the main site back in operation without losing too much data and with as little interruption to users as possible.&amp;nbsp; You first need to check for any transactions that did not make it off the main site and apply them to the DR site master.&amp;nbsp; In MySQL you can do this by &lt;u&gt;&lt;i&gt;carefully&lt;/i&gt;&lt;/u&gt; applying transactions from the main site binlog.&amp;nbsp; You can help yourself considerably by including a step in the site failover process where you fence (i.e., turn off) the old site as quickly as possible by shutting down applications and taking applications offline.&amp;nbsp;&amp;nbsp; The fewer extra transactions on the main site, the simpler it is to clean up. &lt;br /&gt;&lt;br /&gt;Next, you need to get the master site resynchronized with the slave.&amp;nbsp; If there are more than a few differences, you will probably just restore the main site master and slave from local backups, then manually configure them to make the main site master a relay slave of the DR site.&amp;nbsp;&amp;nbsp;&amp;nbsp; If you have large databases, you may want to look at SAN or NAS products like NetAPP that offer snapshot capabilities.&amp;nbsp; I have been working lately with NetApp; the &lt;b&gt;snap restore&lt;/b&gt; command is really impressive for rolling back file system state quickly.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;img border="0" height="257" src="http://3.bp.blogspot.com/_26KnjtB2MFo/TLnW0PbZUqI/AAAAAAAAAF4/QfDyNihy-F0/s400/DR-Arch-Site-Failed.png" style="margin-left: auto; margin-right: auto;" width="400" /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;DR Site Operation and Main Site Recovery&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_26KnjtB2MFo/TLnW0PbZUqI/AAAAAAAAAF4/QfDyNihy-F0/s1600/DR-Arch-Site-Failed.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;/a&gt;&lt;/div&gt;Once the main site is caught up, you can switch applications back the main site by taking a short outage to move applications.&amp;nbsp; This step is not fully transparent, but unlike the original DR failover, you get to pick the time that is least inconvenient for your users.&amp;nbsp; Also, you can use Tungsten features like consistency checks to verify that data are consistent across sites.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;&lt;u&gt;&lt;b&gt;Underlying Tungsten Features to Enable DR&lt;/b&gt;&lt;/u&gt;&lt;br /&gt;&lt;br /&gt;As promised at the beginning, here is a look at the Tungsten features that make DR work.&amp;nbsp; First, there is automated failover with floating IP address management.&amp;nbsp; Tungsten uses a rules engine combined with group communications to manage failover and master floating IPs efficiently.&amp;nbsp; The rules take care of many of the weird failure cases as well as handling tasks like automatically making slave servers readonly, etc.&amp;nbsp; Setting up DR without floating IP addresses is more complex because it means your relay slave needs to know when the main site master moves for any reason. &lt;br /&gt;&lt;br /&gt;As useful as floating IP addresses are, Tungsten has a much more important feature that underlies the entire DR site architecture:&amp;nbsp; global transaction IDs.&amp;nbsp;&amp;nbsp; Unlike native MySQL replication, Tungsten assigns a global ID or &lt;i&gt;seqno&lt;/i&gt; to each transaction as it is read from the binlog.&amp;nbsp; Tungsten replicator processes track position using the seqno values rather than the file name and offset used by MySQL slaves.&amp;nbsp; Here is a picture that illustrates how the replicator log works. &lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;img border="0" height="221" src="http://2.bp.blogspot.com/_26KnjtB2MFo/TLnU1oG6tvI/AAAAAAAAAFw/y4zcClDa4L4/s400/DR-Arch-Global-IDs.png" style="margin-left: auto; margin-right: auto;" width="400" /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Global IDs, Epoch Numbers, and Backups&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_26KnjtB2MFo/TLnU1oG6tvI/AAAAAAAAAFw/y4zcClDa4L4/s1600/DR-Arch-Global-IDs.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;As already mentioned, the Tungsten master replicator assigns the seqno to each transaction as it is extracted.&amp;nbsp; Tungsten slave replicators always use the seqno to request the next event from the master.&amp;nbsp; This means that you can switch the master without worrying whether slaves will lose track of their positions, because they will just ask for the seqno from the new master.&lt;br /&gt;&lt;br /&gt;The other important feature of global IDs is that they make backups fungible across different databases and even sites.&amp;nbsp; Tungsten marks the database with the current seqno and epoch number.&amp;nbsp; As long as your backup (or file system snapshot) is transactionally consistent, you can load it on any server and bring it back online as a slave.&amp;nbsp; The new slave will connect to and catch up with the master, wherever it happens to be.&amp;nbsp; This makes database recovery both simple and very flexible.&lt;br /&gt;&lt;br /&gt;The phrase "transactional consistency" brings up another issue.&amp;nbsp; To make the disaster recovery architecture work reliably I strongly recommend you switch to InnoDB or another fully transactional engine.&amp;nbsp; MyISAM does not have a place in this architecture--there are just too many ways to end up with corrupt data and a massive outage. &lt;br /&gt;&lt;br /&gt;There is one final aspect of Global IDs in Tungsten that is worth mentioning.&amp;nbsp; What if the master log is corrupted or a slave from a different cluster accidentally logs into the wrong master?&amp;nbsp; In both cases the slave could get bad data if it just asked for the next seqno without some how checking that the master and slave logs are somehow consistent.&amp;nbsp; This would at best lead to errors and in the worst case to badly messed up data. &lt;br /&gt;&lt;br /&gt;Tungsten deals with log consistency problems using epoch numbers. Whenever the master goes online it sets a new epoch number, which works like a parity check on the sequence number.&amp;nbsp;&amp;nbsp; Each time a slave connects to the master, it offers the last seqno it received along with the epoch number.&amp;nbsp; If the values match the same seqno/epoch number in the master log, we assume the logs have the same master and proceed.&amp;nbsp; Otherwise, we assume somebody is confused and do not allow the slave to fetch transactions.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;&lt;u&gt;&lt;b&gt;Conclusion&lt;/b&gt;&lt;/u&gt;&lt;br /&gt;&lt;br /&gt;DR site setup is complex and this article obviously glosses over a lot of details even for databases.&amp;nbsp; One final bit of advice is that whatever you do, test the daylights out of it before deploying.&amp;nbsp; Site failures may be karmic but dealing with them is certainly not.&amp;nbsp; Site failover is a &lt;i&gt;&lt;u&gt;really&lt;/u&gt; &lt;/i&gt;bad time to find out you don't have the password to your DNS provider handy or that you have a network configuration problem on the DR site.&amp;nbsp; One customer I know put all the computers from his main site and DR site in a pile on his conference room table and tested (and retested and retested and retested) until he was completely satisfied with the results.&amp;nbsp; That is the soul of true disaster recovery.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-2314241342605546919?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/2314241342605546919/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=2314241342605546919' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/2314241342605546919'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/2314241342605546919'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2010/10/mysql-disaster-recovery-with-tungsten.html' title='MySQL Disaster Recovery With Tungsten'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_26KnjtB2MFo/TLnVMfxkWTI/AAAAAAAAAF0/jBLmzXRyk9U/s72-c/DR-Arch-Stable.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-1771163731480546354</id><published>2010-04-23T17:07:00.000-07:00</published><updated>2010-04-23T17:08:04.834-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='MariaDB'/><category scheme='http://www.blogger.com/atom/ns#' term='Drizzle'/><category scheme='http://www.blogger.com/atom/ns#' term='Oracle'/><title type='text'>MySQL Conference Slides and Thoughts on State of the Dolphin</title><content type='html'>I did two talks on replication and clustering at the recent &lt;a href="http://en.oreilly.com/mysql2010/"&gt;MySQL Conference&lt;/a&gt; in Santa Clara.&amp;nbsp; Thanks to all of you who attended as well as the fine O'Reilly folks who organized everything.&amp;nbsp; Slides are posted on the talk descriptions at the following URLs:&amp;nbsp; &lt;br /&gt;&lt;br /&gt;&lt;div class="en_session_title"&gt;   &lt;ul&gt;&lt;li&gt;&lt;a href="http://en.oreilly.com/mysql2010/public/schedule/detail/13409"&gt;Clustering for the Masses - A Gentle Introduction to Tungsten for MySQL &lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div class="en_session_title"&gt;   &lt;ul&gt;&lt;li&gt;&lt;a href="http://en.oreilly.com/mysql2010/public/schedule/detail/13515"&gt;Not Your Grandpa’s Replication-The New Wave of MySQL Replication and How It Helps Your Applications&lt;/a&gt; (A collaborative talk with &lt;a href="http://www.joinfu.com/"&gt;Jay Pipes&lt;/a&gt;) &lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;Conferences like the MySQL UC are fun because you get to see all your virtual pals in the flesh and have a beer with them.&amp;nbsp; This is one of the fundamental open source bonding experiences.&amp;nbsp; Unfortunately the taps for draft beer stopped working at the bar, and &lt;a href="http://www.continuent.com/downloads/software/register/login"&gt;Tungsten&lt;/a&gt; is in the middle of a big crunch to get parallel replication working.&amp;nbsp; I didn't get to hang around a lot this year.&amp;nbsp; A few things still stood out compared to 2009.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;First of all,&lt;b&gt; long-term effects of the Oracle acquisition are clear.&lt;/b&gt;&amp;nbsp;&amp;nbsp; Edward Screven's keynote on &lt;a href="http://en.oreilly.com/mysql2010/public/schedule/detail/12440"&gt;"The State of the Dolphin"&lt;/a&gt; was sensible and boring.&amp;nbsp; It seemed a telling metaphor for life in the community going forward.&amp;nbsp; Oracle is going to do an adequate job of MySQL engineering and better than adequate for Windows.&amp;nbsp; This is of course "adequate" in the same way that the word applies to products like Microsoft Word.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;An adequate MySQL is probably the worst possible outcome for the groups trying to build businesses on alternative builds.&amp;nbsp; It looks like an effective way for Oracle to neutralize competitive threats from below for a few years to come.&amp;nbsp;&amp;nbsp; On the other hand, it's good for most users, who won't be greatly inclined to switch unless Oracle tries to soak them for big licensing fees.&amp;nbsp; At least one conference attendee, a licensee of other Oracle products, mentioned that had already happened.&amp;nbsp; He's a MariaDB fan now.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;Second, &lt;b&gt;solid state devices (SSDs) are for real&lt;/b&gt;.&amp;nbsp;&amp;nbsp; Andreas Bechtolsheim gave&lt;a href="http://www.google.com/url?sa=t&amp;amp;source=web&amp;amp;oi=video_result&amp;amp;cad=4681105493028202758&amp;amp;ct=res&amp;amp;cd=1&amp;amp;ved=0CAYQtwIwAA&amp;amp;url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DQPagpPQTaQY&amp;amp;ei=rf7RS6y1EJLOsgP169zSCQ&amp;amp;usg=AFQjCNEGF_M3XAeL28YQCqmo9bAXKiZGdg&amp;amp;sig2=EnuHq5O1DxFl3yDkdKN1vw"&gt; a great talk on the coming SSD revolution&lt;/a&gt; at the 2009 MySQL Conference.&amp;nbsp; It sounded good.&amp;nbsp; At the 2010 conference we started to see some real test results.&amp;nbsp; The hype on SSDs is completely justified.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;There was an &lt;a href="http://en.oreilly.com/mysql2010/public/schedule/detail/14083"&gt;excellent panel talk sponsored by Fusion-IO&lt;/a&gt; that presented some very compelling results including 10x throughput improvements that allowed one of the companies doing the testing to pull out and repurpose 75% of their hosts.&amp;nbsp; PCI-based Fusion-IO cards have a 300- to 400X price differential compared to basic rotating disk, but the cost is likely to drop pretty quickly as the technology matures and more competitors enter the field.&amp;nbsp; Much cheaper SATA alternatives like the Intel X-25 are already starting to flood the low-end market.&amp;nbsp; Anybody building database systems has to have a plan that accounts for SSDs &lt;i&gt;&lt;b&gt;now&lt;/b&gt;&lt;/i&gt;.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;Third, &lt;b&gt;innovation is continuing apace but the problems (and solutions) are moving away from MySQL.&amp;nbsp;&amp;nbsp; &lt;/b&gt;Mark Callaghan really put his finger on it at &lt;a href="http://en.oreilly.com/mysql2010/public/schedule/detail/14929"&gt;his Ignite MySQL talk&lt;/a&gt; when he said, "In 3 years MySQL won't be the default DBMS for high-scale applications."&amp;nbsp; New system investment is going into applications that handle big data, have to utilize new hardware efficiently to operate economically, and require multi-tenancy.&amp;nbsp; These are good targets for Drizzle, PBXT, Tungsten, and other new projects working to make names for themselves.&amp;nbsp;&amp;nbsp; We all have to raise our game or MySQL will start to become irrelevant. &amp;nbsp; It's going to be an interesting year.&amp;nbsp; :)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-1771163731480546354?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/1771163731480546354/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=1771163731480546354' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/1771163731480546354'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/1771163731480546354'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2010/04/mysql-conference-slides-and-thoughts-on.html' title='MySQL Conference Slides and Thoughts on State of the Dolphin'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-8106681774691920320</id><published>2010-04-20T22:14:00.000-07:00</published><updated>2010-04-20T22:25:25.891-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='MariaDB'/><category scheme='http://www.blogger.com/atom/ns#' term='SaaS'/><title type='text'>Customized Data Movement with Tungsten Replicator Pipelines</title><content type='html'>Have you ever run into a problem where MySQL replication did 95% of what you needed but not the remaining 5% to solve a real problem?&amp;nbsp; Hacking the binlog is always a possibility, but it typically looks like &lt;a href="http://onlinezero.com/blog/?p=67"&gt;this example&lt;/a&gt;.&amp;nbsp; Not a pretty sight.&amp;nbsp; Wouldn't it be easier if replication were a bunch of building blocks you could recombine to create custom replicator processes?&amp;nbsp; &lt;br /&gt;&lt;br /&gt;Tungsten 1.3 has a new feature called &lt;i&gt;pipelines&lt;/i&gt; that allows you to do exactly that.&amp;nbsp; A pipeline consists of one or more stages that tie together generic components to extract, filter, store, and apply &lt;i&gt;events&lt;/i&gt;, which is Tungsten parlance for transactions.&amp;nbsp; Each stage has a processing thread, so multi-stage pipelines can process data independently and without blocking.&amp;nbsp; The stages also take care of important but tedious issues like remembering the transactional state of each stage so Tungsten can restart without forgetting events or applying them twice. &lt;br /&gt;&lt;br /&gt;Here is a picture of how a pipeline is put together.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_26KnjtB2MFo/S845BRhPz-I/AAAAAAAAAFQ/WlBU2KrxUEs/s1600/tungsten-replicator-slave-pipeline.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="233" src="http://1.bp.blogspot.com/_26KnjtB2MFo/S845BRhPz-I/AAAAAAAAAFQ/WlBU2KrxUEs/s400/tungsten-replicator-slave-pipeline.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;When Tungsten Replicator starts it loads a pipeline corresponding to its role, for example master or slave.&amp;nbsp;&amp;nbsp; The preceding picture shows a slave pipeline consisting of two stages.&amp;nbsp; The first stage pulls replicated events over the network from a master Tungsten Replicator and stores them in a local transaction history log, which we call the THL.&amp;nbsp; The second stage extracts the stored events and applies them to the database.&amp;nbsp;&amp;nbsp; This pipeline is analogous to the I/O and SQL threads on a MySQL slave.&lt;br /&gt;&lt;br /&gt;Where Tungsten departs from MySQL and most other replicators in a big way is that pipelines, hence the replication flows, are completely configurable.&amp;nbsp;&amp;nbsp; The configuration is stored in file replicator.properties.&amp;nbsp; Here are the property settings to create the slave pipeline.&amp;nbsp; Note how the role is the name of a pipeline.&amp;nbsp; This determines which pipeline to run when the replicator goes online. &lt;br /&gt;&lt;br /&gt;&lt;code size="small"&gt;# Replicator role.&amp;nbsp; &lt;br /&gt;replicator.role=slave&lt;/code&gt;&lt;br /&gt;&lt;code size="small"&gt;...&lt;br /&gt;# Generic pipelines.  replicator.pipelines=master,slave,direct&amp;nbsp;&lt;/code&gt;&lt;br /&gt;&lt;pre&gt;&lt;code size="small"&gt;...&lt;br /&gt;# Slave pipeline has two stages:&amp;nbsp; extract from remote THL to local THL; &lt;br /&gt;# extract from local THL and apply to DBMS. &lt;br /&gt;replicator.pipeline.slave=remote-to-thl,thl-to-dbms&lt;br /&gt;replicator.pipeline.slave.stores=thl&lt;br /&gt;replicator.pipeline.slave.syncTHLWithExtractor=false&lt;br /&gt;&lt;br /&gt;replicator.stage.remote-to-thl=com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask&lt;br /&gt;replicator.stage.remote-to-thl.extractor=thl-remote&lt;br /&gt;replicator.stage.remote-to-thl.applier=thl-local&lt;br /&gt;&lt;br /&gt;replicator.stage.thl-to-dbms=com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask&lt;br /&gt;replicator.stage.thl-to-dbms.extractor=thl-local&lt;br /&gt;replicator.stage.thl-to-dbms.applier=mysql&lt;br /&gt;replicator.stage.thl-to-dbms.filters=mysqlsessions&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;The syntax is not beautiful but it is quite flexible.&amp;nbsp; Here is what this definition means.&lt;br /&gt;&lt;ol&gt;&lt;li&gt;This replicator knows about three pipelines named &lt;i&gt;master&lt;/i&gt;, &lt;i&gt;slave&lt;/i&gt;, and &lt;i&gt;direct&lt;/i&gt;.&amp;nbsp;&lt;/li&gt;&lt;li&gt;The slave pipeline has two stages called &lt;i&gt;remote-to-thl&lt;/i&gt; and &lt;i&gt;thl-to-dbms&lt;/i&gt; and a store called &lt;i&gt;thl&lt;/i&gt;.&amp;nbsp; It has a property named syncTHLWithExtractor which must be set to false for slaves.&amp;nbsp; (We need to change that name to something like 'isMaster'.)&amp;nbsp; &lt;/li&gt;&lt;li&gt;The remote-to-thl stage extracts from &lt;i&gt;thl-remote&lt;/i&gt;.&amp;nbsp; This extractor reads events over the network from a remote replicator.&amp;nbsp; The stage apples to &lt;i&gt;thl-local&lt;/i&gt;, which is an applier that writes events to the local transaction history log.&amp;nbsp; &lt;/li&gt;&lt;li&gt;The thl-to-dbms stage pulls events from the local log and applies them to the database.&amp;nbsp; Note that in addition to an applier and extractor, there is also a filter named &lt;i&gt;mysqlsessions&lt;/i&gt;.&amp;nbsp; This filter looks at events and modifies them to generate a pseudo-session ID, which is necessary to avoid problems with temporary tables when applying transactions from multiple sessions.&amp;nbsp; It is just one of a number of filters that Tungsten provides. &lt;/li&gt;&lt;/ol&gt;Components like appliers, filters, extractors, and stores have individual configuration elsewhere in the tungsten.properties file.&amp;nbsp; Here's an example of configuration for a MySQL binlog extractor.&amp;nbsp; (Note that Tungsten 1.3 can now read binlogs directly as files or relay them from a master server.)&amp;nbsp; &lt;br /&gt;&lt;pre&gt;&lt;code size="small"&gt;&lt;br /&gt;# MySQL binlog extractor properties.&amp;nbsp; &lt;br /&gt;replicator.extractor.mysql=com.continuent.tungsten.replicator.extractor.mysql.MySQLExtractor&lt;br /&gt;replicator.extractor.mysql.binlog_dir=/var/log/mysql&lt;br /&gt;replicator.extractor.mysql.binlog_file_pattern=mysql-bin&lt;br /&gt;replicator.extractor.mysql.host=logos1-u1&lt;br /&gt;replicator.extractor.mysql.port=3306&lt;br /&gt;replicator.extractor.mysql.user=${replicator.global.db.user}&lt;br /&gt;replicator.extractor.mysql.password=${replicator.global.db.password}&lt;br /&gt;replicator.extractor.mysql.parseStatements=true&lt;br /&gt;&lt;br /&gt;# When using relay logs we download from the master into binlog_dir.&amp;nbsp; This &lt;br /&gt;# is used for off-board replication. &lt;br /&gt;#replicator.extractor.mysql.useRelayLogs=false&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;The thing that makes pipelines really flexible is that the interfaces are completely symmetric.&amp;nbsp; Components to extract events from MySQL binlog or from a transaction history log have identical APIs.&amp;nbsp; Similarly, the APIs to apply events are the same whether storing events in a log or applying to a slave.&amp;nbsp; Pipelines can tie together practically any sequence of extract, filter, and apply operations you can think of.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;Here are diagrams of a couple of useful single-stage pipelines.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_26KnjtB2MFo/S85bAZ9KgTI/AAAAAAAAAFY/GwdvtloqtsE/s1600/tungsten-replicator-other-pipelines.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="231" src="http://4.bp.blogspot.com/_26KnjtB2MFo/S85bAZ9KgTI/AAAAAAAAAFY/GwdvtloqtsE/s400/tungsten-replicator-other-pipelines.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;The "dummy" pipeline reads events directly from MySQL binlogs and just throws them away.&amp;nbsp; This sounds useless but in fact it is rather convenient.&amp;nbsp; You can use the dummy pipeline check whether your binlogs are good.&amp;nbsp; If you add filters you can also use a dummy pipeline to report on what is in the binlog.&amp;nbsp; Finally, you can use it as a quick and non-intrusive check to see if Tungsten can handle the data in your binlog--a nice way to ensure you can migrate smoothly.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;Here's the dummy pipeline definition: &lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;code size="small"&gt;# Generic pipelines. &lt;br /&gt;replicator.pipelines=master,slave,direct, dummy&lt;br /&gt;...&lt;br /&gt;# Dummy pipeline has single stage that writes from binlog to bit-bucket. &lt;br /&gt;replicator.pipeline.dummy=binlog-to-dummy&lt;br /&gt;replicator.pipeline.dummy.autoSync=true&lt;br /&gt;&lt;br /&gt;replicator.stage.binlog-to-dummy=com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask&lt;br /&gt;replicator.stage.binlog-to-dummy.extractor=mysql&lt;br /&gt;replicator.stage.binlog-to-slave.applier=dummy&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;The "direct" pipeline fetches events directly from a master MySQL server using client log requests over the network and applies them immediately to a slave.&amp;nbsp; I use this pipeline to test master-to-slave performance, but it's also very handy for transferring a set of SQL updates from the binlog of any master to any slave on the network.&amp;nbsp; For instance, you can transfer upgrade commands very efficiently out of the binlog of a successfully upgraded MySQL server to other servers on the network.&amp;nbsp; You can also use it to "rescue" transactions that are stuck in the binlog of a failed master.&amp;nbsp; That is starting to be genuinely useful.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;The definition of the direct pipeline is already in the default replicator.properties.mysql template that comes with Tungsten 1.3, so it is not necessary to repeat it here.&amp;nbsp; You can just &lt;a href="http://www.continuent.com/downloads/software/register/login"&gt;download the software&lt;/a&gt; (open source version is &lt;a href="http://sourceforge.net/projects/tungsten/files/"&gt;here&lt;/a&gt;) and have a look at it yourself.&amp;nbsp; There's almost more documentation than people can bear--look &lt;a href="http://www.continuent.com/downloads/documentation"&gt;here&lt;/a&gt; to find a full set.&amp;nbsp; Version 1.3 docs will be posted shortly on the website and are already available for commercial customers.&amp;nbsp;&amp;nbsp; As usual you can also &lt;a href="http://tungsten.svn.sourceforge.net/viewvc/tungsten/"&gt;view the source code&lt;/a&gt; on SourceForge.net.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;Pipelines belong to a set of major feature improvements to Tungsten to support SaaS and large enterprise deployments.&amp;nbsp; Some of the other features include fast event logging directly to disk (no more posting events in InnoDB), low-latency WAN transfer, multi-master replication support, and parallel replication.&amp;nbsp; Stay tuned!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-8106681774691920320?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/8106681774691920320/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=8106681774691920320' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/8106681774691920320'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/8106681774691920320'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2010/04/customized-data-movement-with-tungsten.html' title='Customized Data Movement with Tungsten Replicator Pipelines'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_26KnjtB2MFo/S845BRhPz-I/AAAAAAAAAFQ/WlBU2KrxUEs/s72-c/tungsten-replicator-slave-pipeline.jpg' height='72' width='72'/><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-3570478421140412140</id><published>2010-03-28T22:24:00.000-07:00</published><updated>2010-03-28T22:27:21.839-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='SaaS'/><category scheme='http://www.blogger.com/atom/ns#' term='PostgreSQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Drizzle'/><title type='text'>New Tungsten Software Releases for MySQL and PostgreSQL</title><content type='html'>I would like to announce a couple of new Tungsten versions available for your database clustering enjoyment.&amp;nbsp; As most readers of this blog are aware, Tungsten allows users to create highly available data services that include replicated copies, distributed management, and application connectivity using unaltered open source databases.&amp;nbsp;&amp;nbsp; We are continually improving the software and have a raft of new features coming out this year.&amp;nbsp;&amp;nbsp; &lt;br /&gt;&lt;br /&gt;First, there is a new Tungsten 1.2.3 maintenance release available in both commercial as well as open source editions.&amp;nbsp; You can get access to the&lt;a href="http://www.continuent.com/downloads"&gt; commercial version on the Continuent website&lt;/a&gt;, while the &lt;a href="https://sourceforge.net/projects/tungsten/"&gt;open source version is available on SourceForge&lt;/a&gt;.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;&amp;nbsp;The Tungsten 1.2.3 release focuses on improvements for MySQL users including the following: &lt;br /&gt;&lt;ul&gt;&lt;li&gt;Transparent session consistency for multi-tenant applications.&amp;nbsp; This allows applications that follow some simple conventions like sharding tenant data by database to get automatic read scaling to slaves without making code changes. &lt;/li&gt;&lt;li&gt;A greatly improved script for purging history on Tungsten Replicator.&amp;nbsp; &lt;/li&gt;&lt;li&gt;Fixes to binlog extraction to handle enum and set data types correctly.&amp;nbsp; &lt;/li&gt;&lt;/ul&gt;By far the biggest improvement in this release is &lt;a href="http://www.continuent.com/downloads/documentation"&gt;Tungsten product documentation&lt;/a&gt;, including major rewrites for the guides covering management and connectivity.&amp;nbsp; Even the &lt;a href="http://www.continuent.com/images/stories/pdfs/tungsten-release-notes.pdf"&gt;Release Notes&lt;/a&gt; are better.&amp;nbsp; If you want to find out how Tungsten works, start with the new &lt;a href="http://www.continuent.com/images/stories/pdfs/tungsten-concepts-and-administration-guide.pdf"&gt;Tungsten Concepts and Administration Guide&lt;/a&gt;.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;Second, there's a new Tungsten 1.3 release coming out soon.&amp;nbsp; Commercial versions are already in use at selected customer sites, and you can build the open source version by &lt;a href="http://sourceforge.net/projects/tungsten/develop"&gt;downloading code from SVN on SourceForge&lt;/a&gt;.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;The Tungsten 1.3 release sports major feature additions in the following areas:&amp;nbsp; &lt;br /&gt;&lt;ul&gt;&lt;li&gt;A new replicator architecture that allows you to manage non-Tungsten replication and also to configure very flexible replication flows to use multi-core systems more effectively and implement complex replication topologies.&amp;nbsp; The core processing loop for replication can now cycle through 700,000 events per second on my laptop--it's really quick.&amp;nbsp; &lt;/li&gt;&lt;li&gt;Much improved support for PostgreSQL warm standby clustering as well as provisional management of new PostgreSQL 9 features like streaming replication and hot standby.&amp;nbsp;&amp;nbsp;&lt;/li&gt;&lt;li&gt;Replication support for just about everything in the MySQL binlog:&amp;nbsp; large transactions, unsigned characters, session variables, various permutations of character sets and binary data, and ability to download binlog files through the MySQL client protocol.&amp;nbsp; If you can put it in the binlog we can replicate it.&amp;nbsp;&amp;nbsp;&lt;/li&gt;&lt;/ul&gt;We also have provisional support for Drizzle thanks to &lt;a href="http://developian.blogspot.com/2009/10/replication-from-mysql-to-drizzle-using.html"&gt;Markus Ericsson&lt;/a&gt;, plus a raft of other improvements.&amp;nbsp; This has been a huge amount of work all around, so I hope you'll enjoy the results.&lt;br /&gt;&lt;br /&gt;P.s., Contact Continuent if you want to be a beta test site for Tungsten 1.3.&amp;nbsp;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-3570478421140412140?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/3570478421140412140/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=3570478421140412140' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/3570478421140412140'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/3570478421140412140'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2010/03/new-tungsten-software-releases-for.html' title='New Tungsten Software Releases for MySQL and PostgreSQL'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-4140150119673497442</id><published>2010-03-22T17:18:00.000-07:00</published><updated>2010-03-22T17:24:36.387-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><title type='text'>Replication and More Replication at 2010 MySQL Conference</title><content type='html'>Database replication is still interesting after all these years.&amp;nbsp; Two of my talks focused on replication technology were accepted for the upcoming &lt;a href="http://en.oreilly.com/mysql2010/"&gt;MySQL 2010 Conference&lt;/a&gt;.&amp;nbsp; Here are the summaries.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://en.oreilly.com/mysql2010/public/schedule/detail/13409"&gt;Clustering for the Masses - A Gentle Introduction to Tungsten for MySQL&amp;nbsp;&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://en.oreilly.com/mysql2010/public/schedule/detail/13515"&gt;Not Your Grandpa’s Replication-The New Wave of MySQL Replication and How It Helps Your Applications&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;The first talk is a solo presentation covering &lt;a href="http://www.continuent.com/community"&gt;Tungsten&lt;/a&gt;, which creates highly available and scalable database clusters using vanilla MySQL databases linked by flexible replication.&amp;nbsp; I'll describe how it works and some cool things you can do like zero-downtime upgrades and session-based performance scaling.&amp;nbsp;&amp;nbsp; If you want to know how Tungsten can help you, this is a good time to find out.&lt;br /&gt;&lt;br /&gt;The second talk is a joint effort with &lt;a href="http://en.oreilly.com/mysql2010/public/schedule/speaker/92"&gt;Jay Pipes&lt;/a&gt; covering issues like big data that are driving replication technology and the solutions to these problems available to MySQL users.&amp;nbsp; We'll lay out our vision of where things are going to try to help you pick the right technology for your next project.&amp;nbsp;&amp;nbsp; Jay and I are also soliciting input on this talk from the Drizzle community among others.&amp;nbsp; If you are interested check out &lt;a href="https://lists.launchpad.net/drizzle-discuss/msg06198.html"&gt;the thread on drizzle-discuss&lt;/a&gt; or post to this blog. &lt;br /&gt;&lt;br /&gt;Finally, I'll be around for much of the MySQL conference, so if you are interested in Tungsten or data replication in general or just want to hang out, please look me up. &amp;nbsp; See you in Santa Clara! &lt;br /&gt;&lt;br /&gt;&lt;div class="en_session_title"&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-4140150119673497442?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/4140150119673497442/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=4140150119673497442' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/4140150119673497442'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/4140150119673497442'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2010/03/replication-and-more-replication-at.html' title='Replication and More Replication at 2010 MySQL Conference'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-5877420089574698487</id><published>2010-03-22T11:50:00.000-07:00</published><updated>2010-03-22T11:51:50.704-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='PostgreSQL'/><title type='text'>Tungsten and PostgreSQL 9 at PG-East Conference</title><content type='html'>My Continuent colleagues Linas Virbalas and Alex Alexander will be giving a talk entitled &lt;a href="http://postgresqlconference.org/2010/east/talks/building_tungsten_clusters"&gt;Building Tungsten Clusters with PostgreSQL Hot Standby and Streaming Replication&lt;/a&gt; later this week at the &lt;a href="http://postgresqlconference.org/2010/east/"&gt;PG-East Conference&lt;/a&gt; in Philadelphia.&amp;nbsp;&amp;nbsp; I saw the demo last week and it's quite impressive.&amp;nbsp; You can flip the master and slaves for maintenance, open slaves for reads, failover automatically, etc.&amp;nbsp; It's definitely worth attending if you are in Philly this week.&lt;br /&gt;&lt;br /&gt;Looking beyond the conference, we plan to be ready to support Tungsten clusters on PostgreSQL 9 as soon as it goes production.&amp;nbsp;&amp;nbsp; Everything we have seen so far indicates that the new log streaming and hot standby features are going to be real hits.&amp;nbsp; They not only help applications, but from a clustering perspective queryable slaves with minimal replication lag are also a lot easier to manage.&amp;nbsp; Alex and Linas will have more to say about that during their presentation.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;Meanwhile, I'm sorry to miss the PG-East conference but wish everyone who will be attending a great time.&amp;nbsp; See you later this year at PG-West!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-5877420089574698487?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/5877420089574698487/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=5877420089574698487' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/5877420089574698487'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/5877420089574698487'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2010/03/tungsten-and-postgresql-9-at-pg-east_22.html' title='Tungsten and PostgreSQL 9 at PG-East Conference'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-7869508467428872015</id><published>2010-01-28T18:10:00.000-08:00</published><updated>2010-01-28T18:14:05.058-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='MariaDB'/><title type='text'>MariaDB is Thinking about Fixing MySQL Replication and You Can Help</title><content type='html'>In case you have not noticed, &lt;a href="https://launchpad.net/maria"&gt;MariaDB&lt;/a&gt; is joining the list of projects thinking about how to improve MySQL replication.&amp;nbsp;&amp;nbsp; The discussion thread starts &lt;a href="https://lists.launchpad.net/maria-developers/msg01998.html"&gt;here&lt;/a&gt; on the &lt;a href="https://launchpad.net/%7Emaria-developers"&gt;maria-developers mailing list&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;This discussion was jointly started by &lt;a href="http://askmonty.org/wiki/index.php/Main_Page"&gt;Monty Program&lt;/a&gt;, &lt;a href="http://www.codership.com/"&gt;Codership&lt;/a&gt;, and &lt;a href="http://www.continuent.com/"&gt;Continuent&lt;/a&gt; (my employer) in an effort to push the state of the art beyond features offered by the current MySQL replication.&amp;nbsp; Now that things are starting to die down with the Oracle acquisition, we can get back to the job of making the MySQL code base substantially better.&amp;nbsp; The first step in that effort is to get a discussion going to develop our understanding of the replication problems we think are most important and outline a strategy to solve them.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;Speaking as a developer on Tungsten, my current preference would to be to improve the existing MySQL replication.&amp;nbsp; I suspect this would also be the preference of most current MySQL users.&amp;nbsp; However, there are also more radical approaches on the table, for example from our friends at Codership, who are developing an innovative form of multi-master replication based on group communications and transaction certification.&amp;nbsp; That's a good thing, as we want a range of contrasting ideas that take full advantage of the creativity in the community on the topic of replication. &lt;br /&gt;&lt;br /&gt;If you have interest in improving MySQL replication please join the MariaDB project and contribute your thoughts.&amp;nbsp; It should be an interesting conversation.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-7869508467428872015?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/7869508467428872015/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=7869508467428872015' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/7869508467428872015'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/7869508467428872015'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2010/01/mariadb-is-thinking-about-fixing-mysql.html' title='MariaDB is Thinking about Fixing MySQL Replication and You Can Help'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-6285892869366739459</id><published>2010-01-27T12:16:00.000-08:00</published><updated>2010-01-28T07:01:53.636-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='PostgreSQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Drizzle'/><title type='text'>Tungsten 1.2.2 Release is Out - Faster, More Stable, More Fun</title><content type='html'>Release 1.2.2 of Tungsten Clustering is available on &lt;a href="http://sourceforge.net/projects/tungsten/"&gt;SourceForge&lt;/a&gt; as well as through the &lt;a href="http://www.continuent.com/"&gt;Continuent website&lt;/a&gt;. &amp;nbsp;The release contains mostly bug fixes in the open source version but there are also two very significant improvements of interest to all users.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The manager and monitoring capabilities of Tungsten are completely integrated on the same group communications channel. &amp;nbsp;This fixes a number of problems that caused data sources not to show up properly in older versions. &amp;nbsp;&lt;/li&gt;&lt;li&gt;We are officially supporting a new Tungsten Connector capability for MySQL called pass-through mode, which allows us to proxy connections by transferring network blocks directly rather than translating native request protocol to JDBC calls. &amp;nbsp;Our tests show that it speeds up throughput by as much as 200% in some cases.&amp;nbsp;&lt;/li&gt;&lt;/ul&gt;The commercial version has additional features like PostgreSQL warm standby clustering, add-on rules to manage master virtual IP addresses and other niceties. &amp;nbsp; If you are serious about replication and clustering it is worth a look.&lt;br /&gt;&lt;br /&gt;This is a good time to give a couple of reminders for Tungsten users. &amp;nbsp;First, Tungsten is distributed as a single build that integrates replication, management, monitoring, and connectivity. &amp;nbsp; The old Tungsten Replicator and Myosotis builds are going away. &amp;nbsp; Second, we have &lt;a href="http://www.continuent.com/downloads/documentation"&gt;a single set of docs&lt;/a&gt; on the Continuent website that covers both open source and commercial distributions. &lt;br /&gt;&lt;br /&gt;With that, enjoy the new release. &amp;nbsp;If you are using the open source edition, please post your experiences in &lt;a href="http://www.continuent.com/community/forum"&gt;the Tungsten community forums&lt;/a&gt; or write a blog article. &amp;nbsp;We would love to hear from you.&lt;br /&gt;&lt;br /&gt;P.s., We have added Drizzle support thanks to &lt;a href="http://developian.blogspot.com/2009/10/replication-from-mysql-to-drizzle-using.html"&gt;a patch from Marcus Eriksson&lt;/a&gt; but it's not in 1.2.2. &amp;nbsp;For that you need to build directly from the SVN trunk. &amp;nbsp;Drizzle support will be out in binary builds as part of Tungsten version 1.3.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-6285892869366739459?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/6285892869366739459/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=6285892869366739459' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/6285892869366739459'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/6285892869366739459'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2010/01/tungsten-122-release-is-out-faster-more.html' title='Tungsten 1.2.2 Release is Out - Faster, More Stable, More Fun'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-586800230737780348</id><published>2010-01-17T00:58:00.000-08:00</published><updated>2010-01-17T09:52:49.359-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><title type='text'>What's in *Your* Binlog?</title><content type='html'>Over the last couple of months I have run into a number of replication problems where I needed to run reports on MySQL binlogs to understand what sort of updates servers were processing as well as to compute peak and average throughput.&amp;nbsp;&amp;nbsp; It seems that not even &lt;a href="http://www.maatkit.org/"&gt;Maatkit&lt;/a&gt; has a simple tool to report on binlog contents, so I wrote a quick Perl script called &lt;a href="http://tungsten.svn.sourceforge.net/viewvc/tungsten/trunk/replicator/bin/binlog-analyze.pl?revision=1408&amp;amp;view=markup"&gt;analyze-binlog.pl&lt;/a&gt; to summary output from mysqlbinlog, which is the standard tool to dump binlogs to text.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;The script operates as a filter with the following syntax:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;Usage: ./binlog-analyze.pl [-h] [-q] [-v]&lt;br /&gt;Options:&lt;br /&gt;  -h : Print help&lt;br /&gt;  -q : Suppress excess output&lt;br /&gt;  -v : Print verbosely for debugging&lt;/code&gt;&lt;/pre&gt;To get a report, you just run mysqlbinlog on a binlog file and pipe the results into analyze-binlog.pl.&amp;nbsp; Here is typical invocation and output.&amp;nbsp; The -q option keeps the output as short as possible.&lt;br /&gt;&lt;pre&gt;&lt;code&gt;$ mysqlbinlog /var/lib/mysql/mysql-bin.001430 | ./binlog-analyze.pl -q&lt;br /&gt;===================================&lt;br /&gt;| SUMMARY INFORMATION             |&lt;br /&gt;===================================&lt;br /&gt;Server Version    : 5.0.89&lt;br /&gt;Binlog Version    : 4&lt;br /&gt;Duration          : 1:03:37 (3817s)&lt;br /&gt;&lt;br /&gt;===================================&lt;br /&gt;| SUMMARY STATISTICS              |&lt;br /&gt;===================================&lt;br /&gt;Lines Read        :        17212685&lt;br /&gt;Events            :         3106006&lt;br /&gt;Bytes             :      1073741952&lt;br /&gt;Queries           :         2235077&lt;br /&gt;Xacts             :          817575&lt;br /&gt;Max. Events/Second:            5871.00&lt;br /&gt;Max. Bytes/Second :         1990077.00&lt;br /&gt;Max. Event Bytes  :          524339&lt;br /&gt;Avg. Events/Second:             813.73&lt;br /&gt;Avg. Bytes/Second :          281305.20&lt;br /&gt;Avg. Queries/Sec. :             585.56&lt;br /&gt;Avg. Xacts/Sec.   :             214.19&lt;br /&gt;Max. Events Time  :         9:01:02&lt;br /&gt;&lt;br /&gt;===================================&lt;br /&gt;| EVENT COUNTS                    |&lt;br /&gt;===================================&lt;br /&gt;Execute_load_query   :           10&lt;br /&gt;Intvar               :        53160&lt;br /&gt;Query                :      2235077&lt;br /&gt;Rotate               :            1&lt;br /&gt;Start                :            1&lt;br /&gt;User_var             :          182&lt;br /&gt;Xid                  :       817575&lt;br /&gt;&lt;br /&gt;===================================&lt;br /&gt;| SQL STATEMENT COUNTS            |&lt;br /&gt;===================================&lt;br /&gt;begin                :       817585&lt;br /&gt;create temp table    :            0&lt;br /&gt;delete               :        31781&lt;br /&gt;insert               :           20&lt;br /&gt;insert into          :       411266&lt;br /&gt;select into          :            0&lt;br /&gt;update               :       633857&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;There are lots of things to see in the report, so here are a few examples.&amp;nbsp; For one thing, peak update rates generate 5871 events and close to 2Mb of log output per second.&amp;nbsp; That's loaded but not enormously so--MySQL replication can easily dump over 10,000 events per second into the binlog using workhorse 4-core machines.&amp;nbsp; The application(s) connected to the database execute a large number of fast, short transactions--typical of data logging operations, for example storing session data.&amp;nbsp; We can also see from the Execute_load_query events that somebody executed MySQL &lt;code&gt;LOAD DATA INFILE &lt;/code&gt;commands.&amp;nbsp; That's interesting to me because we are just putting them into Tungsten and need to look out for them in user databases. &lt;br /&gt;&lt;br /&gt;To interprete the binlog report most effectively, you need to understand MySQL binlog event types.&amp;nbsp; MySQL replication developers have kindly provided &lt;a href="http://forge.mysql.com/wiki/MySQL_Internals_Binary_Log"&gt;a very helpful description of the MySQL binlog format&lt;/a&gt; that is not hard to read.&amp;nbsp; You'll need to refer to it if you get very deeply into binlog analysis.&amp;nbsp; It certainly beats reading the MySQL replication code, which is a bit of a thicket.&lt;br /&gt;&lt;br /&gt;Anyway, I hope this script proves useful.&amp;nbsp; As you may have noted from the URL the script is checked into the &lt;a href="http://www.continuent.com/community"&gt;Tungsten&lt;/a&gt; project on &lt;a href="http://sourceforge.net/projects/tungsten/"&gt;SourceForge&lt;/a&gt; and will be part of future releases.&amp;nbsp; I plan to keep tweaking it regularly to add features and fix bugs.&amp;nbsp; Incidentally, if you see any bugs let me know.&amp;nbsp; There are without doubt a couple left.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-586800230737780348?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/586800230737780348/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=586800230737780348' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/586800230737780348'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/586800230737780348'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2010/01/whats-in-your-binlog.html' title='What&apos;s in *Your* Binlog?'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-7353833054966054064</id><published>2010-01-02T18:32:00.000-08:00</published><updated>2010-01-02T18:40:39.164-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='SaaS'/><category scheme='http://www.blogger.com/atom/ns#' term='PostgreSQL'/><title type='text'>Exploring SaaS Architectures and Database Clustering</title><content type='html'>&lt;div style="font-family: inherit; margin-bottom: 0in;"&gt;Software-as-a-Service (Saas) is one of the main growth areas in modern database applications.&amp;nbsp;  This topic has become a correspondingly important focus for &lt;a href="http://www.continuent.com/community"&gt;Tungsten&lt;/a&gt;, not least of all because new SaaS applications make heavy use of open source databases like MySQL and PostgreSQL that Tungsten supports. &lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: inherit; margin-bottom: 0in;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: inherit; margin-bottom: 0in;"&gt;This blog article introduces a series of essays on database architectures for SaaS and how we are adapting Tungsten to enable them more easily.&amp;nbsp;  I plan to focus especially on problems of replication and clustering relevant to SaaS—what are the problems, what are the common design patterns to solve them, and how to deploy and operate the solutions.  I will also discuss how to make replication and clustering work better for these cases—either using Tungsten features that already exist or features we are designing.   &lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: inherit; margin-bottom: 0in;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: inherit; margin-bottom: 0in;"&gt;I hope everything you read will be solid, useful stuff.  However, I will also discuss problems where we are in effect thinking out loud about on-going design issues, so you may also see some ideas that are half-baked or flat-out wrong.&amp;nbsp; Please do me the kindness of pointing out how they can be improved.  &lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: inherit; margin-bottom: 0in;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: inherit; margin-bottom: 0in;"&gt;Now let's get started.  The most important difference between SaaS applications and ordinary apps is &lt;i&gt;multi-tenancy.&lt;/i&gt;  SaaS applications are typically designed from the ground up to run multiple tenants (i.e., customers) on shared software and hardware.  One popular design pattern is to have users share applications but keep each tenant's data stored in a separate database, spreading the tenant databases over multiple servers as the number of tenants grows.&amp;nbsp; &lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: inherit; margin-bottom: 0in;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: inherit; margin-bottom: 0in;"&gt;Multi-tenancy has a number of important impacts on database architecture.  I'm going to mention just three, but they are all significant.  First of all, multi-tenant databases tend to evolve into complex topologies.  Here's a simple example that shows how a successful SaaS application quickly grows from a single, harmless DBMS server to five servers linked by replication with rapid growth beyond. &amp;nbsp; &lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_26KnjtB2MFo/Sz_vuplJL2I/AAAAAAAAAEg/FeDE6u7omtE/s1600-h/SaaS-Topology-Change.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/_26KnjtB2MFo/Sz_vuplJL2I/AAAAAAAAAEg/FeDE6u7omtE/s400/SaaS-Topology-Change.jpg" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: inherit; margin-bottom: 0in;"&gt;In the beginning, the application has tenant data stored in separate databases plus an extra database for&amp;nbsp; the list of tenants as well as data shared by every application.  In accounting applications, for example, the shared information would include items like currency exchange and VAT rates that are identical for each tenant.  Everything fits into a single DBMS server and life is good.&amp;nbsp; &lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: inherit; margin-bottom: 0in;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: inherit; margin-bottom: 0in;"&gt;Now business booms and more tenants join, so soon we split the single server into three—a server for the shared data plus two tenant servers.  We add replication to move the shared data into tenant databases.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: inherit; margin-bottom: 0in;"&gt;Meanwhile business booms still more.  Tenants want to run reports, which have a tendency to hammer  the tenant servers.  We set up separate analytics servers with optimized hardware and alternative indexing on the schema, plus more replication to load data dynamically from tenant databases. &lt;br /&gt;&lt;br /&gt;And this is just the beginning of additional servers as the SaaS adds more customers and invents new services.&amp;nbsp; It is not uncommon for successful SaaS vendors to run 20 or more DBMS servers, especially when you count slave copies maintained for failover and consider that many SaaS vendors also operate multiple sites.&amp;nbsp; At some point in this evolution the topology, including replication as well as management of the databases, is no longer manually maintainable.&amp;nbsp; As we say in baseball, Welcome to the Bigs.   &lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: inherit; margin-bottom: 0in;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: inherit; margin-bottom: 0in;"&gt;Complex topologies with multiple DBMS servers lead to a second significant SaaS issue: failures.  Just having a lot of servers already means failures are a bigger problem than when you run a single DBMS instance.   To show why, let's say individual DBMS servers fail in a way that requires you do something about it on average once a year, a number that reliability engineers call &lt;a href="http://en.wikipedia.org/wiki/MTBF"&gt;&lt;i&gt;Mean Time between Failures&lt;/i&gt;&lt;/a&gt; (MTBF).   Here is a simple table that shows how often we can expect an individual failure to occur.&amp;nbsp; (Supply your own numbers. &amp;nbsp; These are just plausible samples.) &lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0in;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;table border="1" bordercolor="#000000" cellpadding="0" cellspacing="0" height="141" rules="GROUPS" style="width: 478px;"&gt;&lt;col width="123*"&gt;&lt;/col&gt;  &lt;col width="133*"&gt;&lt;/col&gt;  &lt;tbody&gt;&lt;tr valign="TOP"&gt;    &lt;th bgcolor="#000000" sdnum="1033;1033;@" sdval="0" style="font-family: inherit;" width="48%"&gt;&lt;div align="LEFT" style="text-decoration: none;"&gt;&lt;span style="color: white;"&gt;&lt;span style="font-size: small;"&gt;&lt;i&gt;&lt;b&gt;Number     of DBMS Hosts&lt;/b&gt;&lt;/i&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;/th&gt;    &lt;td valign="top"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td valign="top"&gt;&lt;br /&gt;&lt;/td&gt;&lt;th bgcolor="#000000" sdnum="1033;1033;@" sdval="0" width="52%"&gt;&lt;div align="LEFT" style="text-decoration: none;"&gt;&lt;span style="color: white;"&gt;&lt;span style="font-family: Thorndale,serif;"&gt;&lt;span style="font-size: small;"&gt;&lt;i&gt;&lt;b&gt;Days     Between Failures&lt;/b&gt;&lt;/i&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;/th&gt;   &lt;/tr&gt;&lt;tr valign="TOP"&gt;    &lt;td bgcolor="#ffffff" sdnum="1033;1033;@" sdval="0" width="48%"&gt;&lt;div align="LEFT" style="font-style: normal; font-weight: normal; text-decoration: none;"&gt;&lt;span style="color: black;"&gt;&lt;span style="font-family: Thorndale,serif;"&gt;&lt;span style="font-size: small;"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;/td&gt;    &lt;td valign="top"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td valign="top"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td bgcolor="#ffffff" sdnum="1033;1033;General" sdval="365" width="52%"&gt;&lt;div align="LEFT" style="font-style: normal; font-weight: normal; text-decoration: none;"&gt;&lt;span style="color: black;"&gt;&lt;span style="font-family: Thorndale,serif;"&gt;&lt;span style="font-size: small;"&gt;365&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;/td&gt;   &lt;/tr&gt;&lt;/tbody&gt;  &lt;tbody&gt;&lt;tr valign="TOP"&gt;    &lt;td bgcolor="#ffffff" sdnum="1033;1033;@" sdval="0" width="48%"&gt;&lt;div align="LEFT" style="font-style: normal; font-weight: normal; text-decoration: none;"&gt;&lt;span style="color: black;"&gt;&lt;span style="font-family: Thorndale,serif;"&gt;&lt;span style="font-size: small;"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;/td&gt;    &lt;td valign="top"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td valign="top"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td bgcolor="#ffffff" sdnum="1033;1033;General" sdval="182.5" width="52%"&gt;&lt;div align="LEFT" style="font-style: normal; font-weight: normal; text-decoration: none;"&gt;&lt;span style="color: black;"&gt;&lt;span style="font-family: Thorndale,serif;"&gt;&lt;span style="font-size: small;"&gt;182.5&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;/td&gt;   &lt;/tr&gt;&lt;/tbody&gt;  &lt;tbody&gt;&lt;tr valign="TOP"&gt;    &lt;td bgcolor="#ffffff" sdnum="1033;1033;@" sdval="0" width="48%"&gt;&lt;div align="LEFT" style="font-style: normal; font-weight: normal; text-decoration: none;"&gt;&lt;span style="color: black;"&gt;&lt;span style="font-family: Thorndale,serif;"&gt;&lt;span style="font-size: small;"&gt;4&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;/td&gt;    &lt;td valign="top"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td valign="top"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td bgcolor="#ffffff" sdnum="1033;1033;General" sdval="91.3" width="52%"&gt;&lt;div align="LEFT" style="font-style: normal; font-weight: normal; text-decoration: none;"&gt;&lt;span style="color: black;"&gt;&lt;span style="font-family: Thorndale,serif;"&gt;&lt;span style="font-size: small;"&gt;91.3&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;/td&gt;   &lt;/tr&gt;&lt;/tbody&gt;  &lt;tbody&gt;&lt;tr valign="TOP"&gt;    &lt;td bgcolor="#ffffff" sdnum="1033;1033;@" sdval="0" width="48%"&gt;&lt;div align="LEFT" style="font-style: normal; font-weight: normal; text-decoration: none;"&gt;&lt;span style="color: black;"&gt;&lt;span style="font-family: Thorndale,serif;"&gt;&lt;span style="font-size: small;"&gt;8&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;/td&gt;    &lt;td valign="top"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td valign="top"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td bgcolor="#ffffff" sdnum="1033;1033;General" sdval="45.6" width="52%"&gt;&lt;div align="LEFT" style="font-style: normal; font-weight: normal; text-decoration: none;"&gt;&lt;span style="color: black;"&gt;&lt;span style="font-family: Thorndale,serif;"&gt;&lt;span style="font-size: small;"&gt;45.6&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;/td&gt;   &lt;/tr&gt;&lt;/tbody&gt;  &lt;tbody&gt;&lt;tr valign="TOP"&gt;    &lt;td bgcolor="#ffffff" sdnum="1033;1033;@" sdval="0" width="48%"&gt;&lt;div align="LEFT" style="font-style: normal; font-weight: normal; text-decoration: none;"&gt;&lt;span style="color: black;"&gt;&lt;span style="font-family: Thorndale,serif;"&gt;&lt;span style="font-size: small;"&gt;16&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;/td&gt;    &lt;td valign="top"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td valign="top"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td bgcolor="#ffffff" sdnum="1033;1033;General" sdval="22.8" width="52%"&gt;&lt;div align="LEFT" style="font-style: normal; font-weight: normal; text-decoration: none;"&gt;&lt;span style="color: black;"&gt;&lt;span style="font-family: Thorndale,serif;"&gt;&lt;span style="font-size: small;"&gt;22.8&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;/td&gt;   &lt;/tr&gt;&lt;/tbody&gt;  &lt;tbody&gt;&lt;tr valign="TOP"&gt;    &lt;td bgcolor="#ffffff" sdnum="1033;1033;@" sdval="0" width="48%"&gt;&lt;div align="LEFT" style="font-style: normal; font-weight: normal; text-decoration: none;"&gt;&lt;span style="color: black;"&gt;&lt;span style="font-family: Thorndale,serif;"&gt;&lt;span style="font-size: small;"&gt;32&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;/td&gt;    &lt;td valign="top"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td valign="top"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td bgcolor="#ffffff" sdnum="1033;1033;General" sdval="11.4" width="52%"&gt;&lt;div align="LEFT" style="font-style: normal; font-weight: normal; text-decoration: none;"&gt;&lt;span style="color: black;"&gt;&lt;span style="font-family: Thorndale,serif;"&gt;&lt;span style="font-size: small;"&gt;11.4&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;/td&gt;   &lt;/tr&gt;&lt;/tbody&gt; &lt;/table&gt;&lt;div style="margin-bottom: 0in;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: inherit; margin-bottom: 0in;"&gt;&lt;span style="color: black;"&gt;&lt;span style="font-size: small;"&gt;Failures are not just more common with more DBMS hosts, but more difficult to handle.   Consider what happens in the example architecture when a tenant data server fails and has to be replaced with a standby copy.  The replacement must not only replicate correctly from the shared data server, but the analytic server must also be reconfigured to replicate correctly as well.  This is not a simple problem.  There's currently no replication product for open source databases that handles failures in these topologies without sooner or later becoming confused and/or leading to extended downtime. &lt;/span&gt;&lt;/span&gt; &lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: inherit; margin-bottom: 0in;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: inherit; margin-bottom: 0in;"&gt;&lt;span style="color: black;"&gt;&lt;span style="font-size: small;"&gt;There is a third significant SaaS problem:&amp;nbsp; operations on tenants.  This includes provisioning new tenants or moving tenants from one database server to another without requiring extended downtime or application reconfiguration.&amp;nbsp; Backing up and restoring individual tenants is another common problem.  The one-database-per-tenant model is popular in part because it makes these operations much easier.  &lt;/span&gt;&lt;/span&gt; &lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: inherit; margin-bottom: 0in;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: inherit; margin-bottom: 0in;"&gt;&lt;span style="color: black;"&gt;&lt;span style="font-size: small;"&gt;Tenant operations are tractable when you just have a few customers.&amp;nbsp; In the same way that failures become more common with more hosts, tenant operations become more common as tenants multiply.  It is therefore critical to automate them as well as make the impact on other tenants as small as possible. &lt;/span&gt;&lt;/span&gt; &lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: inherit; margin-bottom: 0in;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: inherit; margin-bottom: 0in;"&gt;&lt;span style="color: black;"&gt;&lt;span style="font-size: small;"&gt;Complex topologies, failures, and tenant operations are just three of the issues that make SaaS database architectures interesting as well as challenging to design and deploy.&amp;nbsp; It is well worth thinking about how we can improve database clustering and replication to handle SaaS.&amp;nbsp; That is exactly what we are working on with Tungsten.&amp;nbsp; I hope you will follow me as we dive more deeply into SaaS problems and solutions over the next few months.&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: black;"&gt;&lt;span style="font-size: small;"&gt;P.s., If you run a SaaS and are interested working with us on these features, please contact me at &lt;a href="http://www.continuent.com/"&gt;Continuent&lt;/a&gt;.&amp;nbsp; I'm not hard to find.&amp;nbsp;&lt;/span&gt;&lt;/span&gt; &lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-7353833054966054064?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/7353833054966054064/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=7353833054966054064' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/7353833054966054064'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/7353833054966054064'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2010/01/exploring-saas-architectures-and.html' title='Exploring SaaS Architectures and Database Clustering'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_26KnjtB2MFo/Sz_vuplJL2I/AAAAAAAAAEg/FeDE6u7omtE/s72-c/SaaS-Topology-Change.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-187229291692753402</id><published>2009-12-26T18:02:00.000-08:00</published><updated>2009-12-26T18:07:15.422-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='PostgreSQL'/><title type='text'>Proving Master/Slave Clusters Work and Learning along the Way</title><content type='html'>2009 has been a big year for Tungsten.  In January we had (barely) working replication for MySQL.   It had some neat features like global IDs and event filters, but to be frank you needed imagination to see the real value.  Since then, Tungsten has grown into a full-blown database clustering solution capable of handling a wide range of user problems.   Here are just a few of the features we completed over the course of the year:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Autonomic cluster management using business rules to implement auto-discovery of new databases, failover, and quick recovery from failures&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Built-in broadcast monitoring of databases and replicators&lt;/li&gt;&lt;li&gt;Integrated backup and restore operations&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Pluggable replication management, proven by clustering implementations based on PostgreSQL Warm Standby and Londiste&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Multiple routing mechanisms to provide seamless failover and load balancing of SQL&lt;/li&gt;&lt;li&gt;Last, but not least, simple command line installation to configure and start Tungsten in  minutes&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;You can see the results in our latest release, Tungsten 1.2.1, which comes in both open source and commercial flavors.  (See our &lt;a href="http://www.continuent.com/downloads/software"&gt;downloads pag&lt;/a&gt;e to get software as well as documentation.)&lt;br /&gt;&lt;br /&gt;In the latter part of 2009 we also worked through our first round of customer deployments, which was an adventure but helped Tungsten grow enormously.   Along the way, we confirmed a number of hunches and learned some completely new lessons.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Hardware is changing the database game.&lt;/span&gt;  In particular, performance improvements are shifting clustering in the direction of loosely coupled master/slave replication rather than tightly coupled multi-master approaches.  As I laid out &lt;a href="http://scale-out-blog.blogspot.com/2009/09/future-of-database-clustering.html"&gt;in a previous article&lt;/a&gt;, the problem space is shifting from database performance to availability, data protection, and utilization.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Software as a Service (SaaS) is an important driver for replication technology.&lt;/span&gt;  Not only is the SaaS sector growing, but even small SaaS applications can result in complex database topologies that need parallel, bi-directional, and cross-site replication, among other features.  SaaS business economics tend to drive building these systems on open source databases like MySQL and PostgreSQL.  By supporting SaaS, you support many other applications as well.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Cluster management is hard but worthwhile.  &lt;/span&gt;Building distributed management with no single points-of-failure is a challenging problem and probably the place where Tungsten still has the most work to do.  Once you get it working, though, it's like magic.  We have been focused on trying to make management procedures not just simple but wherever possible to do away with them completely by making the cluster self-managing.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Business rules rock&lt;/span&gt;. We picked the &lt;a href="http://www.jboss.org/drools/"&gt;DROOLS&lt;/a&gt; rule engine to help control Tungsten and make it automatically reconfigure itself when data sources appear or fail.  The result has been an incredibly flexible system that is easy to diagnose and extend.  Just one example:  floating IP address support for master databases took 2 hours to implement using a couple of new rules that work alongside the existing rule set.  If you are not familiar with rules technology, there is still time to make a New Year's resolution to learn it in 2010.  It's powerful stuff.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Clustering has to be transparent.  I mean &lt;span style="font-style: italic;"&gt;really&lt;/span&gt; transparent. &lt;/span&gt; We were in denial on this subject before we started to work closely with ISPs, where you don't have the luxury of asking people to change code.  Tungsten Replicator is now close to a drop-in replacement for MySQL replication as result.  We also implemented proxying based on packet inspection rather than translation and re-execution to raise throughput and reduce incompatibilities visible to applications.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Ship integrated, easy-to-use solutions.&lt;/span&gt;  We made the mistake of releasing Tungsten into open source as a set of components that users had to integrate themselves.  We have since recanted.  As penance we now ship fully integrated clusters with simple installation procedures even for open source editions and are steadily extending the installations to cover not just our own software but also database and network configuration.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Beyond the features and learning experiences the real accomplishment of 2009 was to prove that integrated master/slave clusters can solve a wide range of problems from data protection to HA to performance scaling.  In fact, what we have implemented actually works a lot better than I expected when we began to design the system back in 2007.  (In case this sounds like a lack of faith, plausible ideas do not not always work in the clustering field.)  If you have not tried Tungsten, download it and see if you share my opinion.&lt;br /&gt;&lt;br /&gt;Finally, keep watching Tungsten in 2010.  We are a long way from running out of ideas for making Tungsten both more capable and easier to use.  It's going to be a great year.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-187229291692753402?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/187229291692753402/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=187229291692753402' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/187229291692753402'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/187229291692753402'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2009/12/proving-masterslave-clusters-work-and.html' title='Proving Master/Slave Clusters Work and Learning along the Way'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-5043203674160023278</id><published>2009-10-31T08:21:00.000-07:00</published><updated>2009-10-31T08:25:18.901-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='PostgreSQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Drizzle'/><title type='text'>Replicating from MySQL to Drizzle and Beyond</title><content type='html'>&lt;a href="http://drizzle.org/"&gt;Drizzle&lt;/a&gt; is one of the really great pieces of technology to emerge from the MySQL diaspora--a lightweight, scalable, and pluggable database for web applications.   I am therefore delighted that &lt;a href="http://developian.blogspot.com/2009/10/replication-from-mysql-to-drizzle-using.html"&gt;Marcus Erikkson has published a patch&lt;/a&gt; to &lt;a href="http://www.continuent.com/community"&gt;Tungsten&lt;/a&gt; that allows replication from MySQL to Drizzle.   He's also working on implementing Drizzle-to-Drizzle support, which will be very exciting.&lt;br /&gt;&lt;br /&gt;Marcus has submitted the patch to us and I have reviewed the code.  It's quite supportable, so I plan to integrate it as soon as we are done with our next Tungsten release, which will post around 5 November.   You will be able to build and run it using our new &lt;a href="http://scale-out-blog.blogspot.com/2009/10/community-builds-for-tungsten.html"&gt;community builds&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;This brings up a question--what about replicating from MySQL to PostgreSQL?   What about other databases?  I get the PostgreSQL replication question fairly often but it may be a while before our in-house team can implement plug-in support for it.  Anybody want to submit a patch in the meantime?  Post in the &lt;a href="http://www.continuent.com/community/forum?func=showcat&amp;amp;catid=2"&gt;Tungsten forums&lt;/a&gt; if you have ideas and need help to get the work done.   Tungsten Replicator code is very modular and it is not hard to add new database support.&lt;br /&gt;&lt;br /&gt;Meanwhile, go Marcus!!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-5043203674160023278?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/5043203674160023278/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=5043203674160023278' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/5043203674160023278'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/5043203674160023278'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2009/10/replicating-from-mysql-to-drizzle-and.html' title='Replicating from MySQL to Drizzle and Beyond'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-6283700173224774813</id><published>2009-10-31T07:54:00.000-07:00</published><updated>2009-10-31T08:37:05.619-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='PostgreSQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Oracle'/><title type='text'>Community Builds for Tungsten Clustering</title><content type='html'>It's been almost two months since I have posted anything on the Scale-Out Blog, as our entire team has been heads-down working on &lt;a href="http://www.continuent.com/community"&gt;Tungsten&lt;/a&gt;. We now have a number of accomplishments that are worth writing articles about.    Item one on that list is community builds for Tungsten clusters.&lt;br /&gt;&lt;br /&gt;Tungsten community builds offer a bone-simple process to check out and build Tungsten clustering software. The result is a fully integrated package that includes replication, management, monitoring, and SQL routing.  The community builds work for MySQL 5.0 and 5.1 and also allow you to set up basic replication from MySQL to Oracle.&lt;br /&gt;&lt;br /&gt;Community builds &lt;span style="font-style: italic; font-weight: bold;"&gt;do not&lt;/span&gt; include much logic for autonomic management, including automated failover and sophisticated rules that keep databases up and running rain or shine.  Those and other features like floating IP address support are part of the commercial Tungsten software.   PostgreSQL and Oracle-to-Oracle support is also commercial only at least for the time being.&lt;br /&gt;&lt;br /&gt;Community builds &lt;span style="font-weight: bold; font-style: italic;"&gt;do&lt;/span&gt; include our standard installation process, which allows you to set up a working cluster a few minutes.   You can back up and restore datebases, check liveness of cluster members, failover master databases for maintenance and a lot of other handy features.   There is also full documentation, located &lt;a href="http://www.continuent.com/downloads/documentation"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;To get started, you need a host running Mac OS X, Linux, or Solaris that meets the following prerequisites.  On Linux you can usually satisfy these requirements using Yum or Apt-get if the required software is not already there.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Java JDK 1.5 or higher. &lt;/li&gt;&lt;li&gt;Ant 1.7.0 or higher for builds&lt;/li&gt;&lt;li&gt;Subversion.  We use version 1.6.1&lt;/li&gt;&lt;li&gt;MySQL 5.0 or 5.1 (only on hosts where cluster is installed)&lt;/li&gt;&lt;li&gt;Ruby 1.8.5 or greater (only on hosts where cluster is installed)&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Now you can grab the software and do a build. Make a work directory, cd to it, and enter the following commands.  (Due to truncation on the blog the SVN URL looks a little funny.  Don't be fooled.)&lt;br /&gt;&lt;pre&gt;&lt;code&gt;svn checkout \&lt;br /&gt;https://tungsten.svn.sourceforge.net/\&lt;br /&gt;svnroot/tungsten/trunk/community&lt;br /&gt;&lt;br /&gt;cd community&lt;br /&gt;./release-community.sh    # (Press ENTER when prompted)&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;The release-community.sh script checks out most of the Tungsten code for you and does a build.  &lt;span style="font-weight: bold;"&gt;IMPORTANT NOTE&lt;/span&gt;:  The command shown above builds SVN HEAD, which means you may have a life of adventure.  You can also build off branches which are more or less stable.  Look at the available config files in the community directory.&lt;br /&gt;&lt;br /&gt;After the build finishes, you have ready-to-install clustering software.  You can scp the resulting tar.gz file out to another host or just cd directly into the build itself as shown below and run the configure script, which sets up Tungsten software on a single host.&lt;br /&gt;&lt;pre&gt;&lt;code&gt;cd build/tungsten-community-2009-1.2&lt;br /&gt;./configure&lt;/code&gt;&lt;/pre&gt;You may need to read the manuals so you get all the answers right.  The installation manual is posted &lt;a href="http://www.continuent.com/downloads/documentation"&gt;here&lt;/a&gt; at &lt;a href="http://www.continuent.com/"&gt;www.continuent.com&lt;/a&gt;.   You'll also need to look at the Replication Guide, Chapter 2 to see how to set up MySQL properly.  We'll do that automatically in the future, but for now it's help yourself.  (Don't worry: the database set-up is easy.)&lt;br /&gt;&lt;br /&gt;To make the cluster interesting you should install on at least a couple of hosts.   Here's what an installed cluster looks like using the Tungsten cluster control (cctrl) program.&lt;br /&gt;&lt;pre&gt;&lt;code&gt;[tungsten@centos5a tungsten-community-2009-1.2]$ tungsten-manager/bin/cctrl&lt;br /&gt;[LOGICAL] /cluster/comm/&gt; ls&lt;br /&gt;&lt;br /&gt;COORDINATOR[centos5a:MANUAL]&lt;br /&gt;&lt;br /&gt;ROUTERS:&lt;br /&gt;+-----------------------------------------------------------------------+&lt;br /&gt;|NONE                                                                   |&lt;br /&gt;+-----------------------------------------------------------------------+&lt;br /&gt;&lt;br /&gt;DATASOURCES:&lt;br /&gt;+-----------------------------------------------------------------------+&lt;br /&gt;|centos5a(master:ONLINE, progress=3)                                    |&lt;br /&gt;+-----------------------------------------------------------------------+&lt;br /&gt;|  REPLICATOR(role=master, state=ONLINE)                                |&lt;br /&gt;|  DATASERVER(state=ONLINE)                                             |&lt;br /&gt;+-----------------------------------------------------------------------+&lt;br /&gt;&lt;br /&gt;+-----------------------------------------------------------------------+&lt;br /&gt;|centos5b(slave:ONLINE, progress=3, latency=0.0)                        |&lt;br /&gt;+-----------------------------------------------------------------------+&lt;br /&gt;|  REPLICATOR(role=slave, master=centos5a, state=ONLINE)                |&lt;br /&gt;|  DATASERVER(state=ONLINE)                                             |&lt;br /&gt;+-----------------------------------------------------------------------+&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;Starting from scratch and pulling code from SourceForge,  it takes me about 30 minutes to get to an installed cluster with two nodes.  At this point you have access to a very powerful set of tools to protect data, keep your databases available, and scale performance.   Look at the manuals.  Try it out.  If you have questions or feedback, post them in the &lt;a href="http://www.continuent.com/community/forum"&gt;Tungsten forums&lt;/a&gt;.   In the meantime, have fun with your database cluster.&lt;br /&gt;&lt;br /&gt;p.s., We will post binary builds next week.  The current build is in final release checks, so you may notice a few problems--I hit a Ruby warning on configuration that will be fixed shortly.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-6283700173224774813?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/6283700173224774813/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=6283700173224774813' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/6283700173224774813'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/6283700173224774813'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2009/10/community-builds-for-tungsten.html' title='Community Builds for Tungsten Clustering'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-3153279976151621659</id><published>2009-09-05T09:46:00.000-07:00</published><updated>2009-09-05T09:46:42.091-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><title type='text'>Tungsten Replicator 1.0.3 Release</title><content type='html'>&lt;a href="http://www.continuent.com/community/tungsten-replicator"&gt;Tungsten Replicator&lt;/a&gt; version 1.0.3 is now released and available as a &lt;a href="https://sourceforge.net/projects/tungsten/files/"&gt;download from Source Forge&lt;/a&gt;.  Tungsten Replicator provides advanced, platform-independent replication for MySQL 5.0/5.1 with global transaction IDs, crash-safe slaves, flexible filtering, and built-in consistency checking.    The 1.0.3 release adds backup and restore, which I described in a &lt;a href="http://scale-out-blog.blogspot.com/2009/07/backups-backups-backups-and-restore.html"&gt;previous blog article&lt;/a&gt;.  &lt;br /&gt;&lt;br /&gt;In addition, there are &lt;a href="http://forge.continuent.org/jira/browse/TREP?report=com.atlassian.jira.plugin.system.project:changelog-panel"&gt;numerous small feature editions and some great bug fixes&lt;/a&gt; that raise performance and stability for large-scale deployment.  For example, the replicator now goes online in seconds even when there are millions of rows in the history table.  This fixes our previous go-online performance which was, er, &lt;a href="http://forge.continuent.org/jira/browse/TREP-316"&gt;pretty slow&lt;/a&gt;.  Thanks to our users in the &lt;a href="http://www.continuent.com/community/tungsten-replicator/forum"&gt;Continuent forums&lt;/a&gt; for helping us to track down this problem as well as several others.&lt;br /&gt;&lt;br /&gt;As of the 1.0.3 release we are also starting to offer the enterprise documentation for the open source replicator.  I think this provides better documentation all around, not least of all because we can do a better job of maintaining a single copy.  Get current replicator documentation &lt;a href="http://www.continuent.com/downloads/documentation"&gt;here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-3153279976151621659?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/3153279976151621659/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=3153279976151621659' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/3153279976151621659'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/3153279976151621659'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2009/09/tungsten-replicator-103-release.html' title='Tungsten Replicator 1.0.3 Release'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-5502983661631528636</id><published>2009-09-01T13:29:00.000-07:00</published><updated>2009-09-02T07:42:20.788-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='PostgreSQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Drizzle'/><category scheme='http://www.blogger.com/atom/ns#' term='Oracle'/><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Computing'/><title type='text'>The Future of Database Clustering</title><content type='html'>Baron Schwartz started &lt;a href="http://www.xaprb.com/blog/2009/08/30/failure-scenarios-and-solutions-in-master-master-replication/"&gt;a good discussion about MMM use cases&lt;/a&gt; that quickly veered into an argument about clustering in general.   As Florian Haas put it on his blog, &lt;a href="http://fghaas.wordpress.com/2009/08/31/on-mysql-replication-cluster-managers-and-drbd-again/"&gt;this is not just an issue of DRBD vs. MySQL Replication&lt;/a&gt;.  Is a database cluster something you cobble together through bits and pieces like &lt;a href="http://mysql-mmm.org/"&gt;MMM&lt;/a&gt;?  Or is it something integrated that we can really call a cluster?  This is the core question that will determine the future of clustering for open source databases. &lt;br /&gt;&lt;br /&gt;I have a strong personal interest in this question, because &lt;a href="http://www.continuent.com/community"&gt;Tungsten clustering&lt;/a&gt;, which I designed, is betting that the answer is changing in two fundamental ways.  First, the problems that clustering solves are evolving, which will in turn will lead to significant changes in off-the-shelf clusters.  Second, for most users the new clusters will be far better than solutions built from a bunch of individual pieces.&lt;br /&gt;&lt;br /&gt;To see why, let's start with some history of the people who use open source databases and why they have been interested in clustering over the last decade or so.   Open source databases have a wide range of users, but there are a couple of particularly significant groups.    Small- to medium-sized business applications like content management systems are a very large segment.  Large web facing applications like &lt;a href="http://www.facebook.com/"&gt;Facebook&lt;/a&gt; or &lt;a href="http://www.gamespot.com/"&gt;GameSpot&lt;/a&gt; are another.   Then there are a lot of custom applications that are somewhere in between--too big to fit on a single database dual- or quad-core server but completely satisfied with the processing power of 2 to 4 servers.&lt;br /&gt;&lt;br /&gt;For a long time all of these groups of users introduced clusters for two main reasons:  ensuring availability and raising performance.  Spreading processing across a cluster of smaller commodity machines was a good solution to both requirements and explains the enormous popularity of MySQL Replication as well as many less-than-successful attempts to implement multi-master clustering.  However the state of the art has evolved in a big way in the last couple of years.&lt;br /&gt;&lt;br /&gt;The reason for change is simple:  hardware.  Multi-core architectures, cheap DRAM, and flash memory are changing not just the cost of databases but the fundamental assumptions of database computing.  Pull out your dog-eared copy of &lt;a href="http://www.amazon.com/Transaction-Processing-Concepts-Techniques-Management/dp/1558601902"&gt;Transaction Processing&lt;/a&gt; by Gray and Reuter, and have a look at the 1991 price/performance trade-offs for memory inside the front cover.  Then look at any recent graph of DRAM and flash memory prices (&lt;a href="http://www.storagesearch.com/ssd-ram-flash%20pricing.html"&gt;like this one&lt;/a&gt;). For example, within a couple of years it will be practical to have even relatively large databases on SSDs.  Assuming reasonable software support random reads and writes to "disk" will approach main memory speeds.  Dirt-cheap disk archives are already spread across the Internet.    The old graph of costs down to off-line tape has collapsed.&lt;br /&gt;&lt;br /&gt;Moreover, open source databases are also starting to catch up with the hardware.  In the MySQL community both MySQL 5.4 and Drizzle are focused on multi-core scaling.  PostgreSQL has been working on this problem for years as well.  Commercial vendors like &lt;a href="http://www.schoonerinfotech.com/"&gt;Schooner&lt;/a&gt; are pushing the boundaries with custom appliances that integrate new hardware better than most users can do it themselves and add substantial database performance improvements to boot.&lt;br /&gt;&lt;br /&gt;With better multi-core utilization plus cheap memory and SSDs, the vast majority of users will be able to run applications with adequate performance on a single database host rather than the 2 to 4 nodes of yore.  In other words, performance scaling is rapidly becoming a non-issue for a larger and larger group of users.  These user don't need infinite performance any more than they need infinite features in a word processing program.  What's already there is enough, or will be within the next year or two.&lt;br /&gt;&lt;br /&gt;Performance is therefore receding as a motivation for clustering.  Meanwhile, here are three needs that will drive database clustering of open source SQL databases over the next few years.&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Availability&lt;/span&gt;.  Keeping databases alive has always been the number one concern for open source database users, even back in the days when hosts and databases were less capable. This is not a guess.  I have talked to hundreds of them since early 2006.  Moreover most users just don't have the time to cover all the corner cases themselves and want something that just works without a lot of integration and configuration.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Data protection&lt;/span&gt;.  Losing data is really bad.  For most users nirvana is verified, up-to-the-minute copies of data without having to worry a great deal about how it happens.  Off-site protection is pretty big too.  Talk to any DBA if you don't believe how important this problem is.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Hardware utilization&lt;/span&gt;.  With the dropping cost of hardware, concerns about up-front hardware investment are becoming somewhat outdated. Operational costs are a different matter.   Let's  look at power consumption and assume a dual CPU host drawing 250W, which we double to allow for cooling and other overhead.  Using &lt;a href="http://www.eia.doe.gov/cneaf/electricity/epm/table5_6_a.html"&gt;recent industrial electricity rates of 13.51 cents per kilowatt/hour in California&lt;/a&gt; you get an electric bill of around $600 per year.  Electricity is just one part of operational expenses, which add up very quickly.  &lt;span style="font-style: italic;"&gt;(Thanks to an alert reader for correcting my math in the original post.)&lt;/span&gt;&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;We will continue to see database clusters in the future: in fact lots of them.  But off-the-shelf clusters that meet the newer requirements in an efficient and cost-effective way for open source databases are going to look quite different from tightly coupled master/master or shared disk clusters like Postgres-R and RAC.   Instead, we will see clusters based for the most part on far more scalable master/slave replication and with features that give them many of the same cluster benefits but cover a wider range of needs.  To the extent that other approaches remain viable in the mass market, they will need to cover these needs as well.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Simple management&lt;/span&gt;&lt;span style="font-weight: bold;"&gt; and monitoring&lt;/span&gt; - The biggest complaint about clustering is that it's complicated.  That's a solvable problem or should be once you can work with master/slave methods instead of more complex approaches.  You can use group communications to auto-discover and auto-provision databases.  You can control failover using simple, configurable policies based on business rules.  You can schedule recurring tasks like backups using job management queues.  You can have installations that pop up and just work.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Fast, flexible replication&lt;/span&gt; - Big servers create big update loads and overwhelm single-threaded slaves.  We either need parallel database replication or disk-level approaches like the proposed PostgreSQL 8.5 log-streaming/hot standby or &lt;a href="http://www.drbd.org/"&gt;DRBD&lt;/a&gt;.  Synchronous replication is a requirement for many users.  Cross-site replication is increasingly common as well.  Finally, replication methods will need to be pluggable, because different replication methods have different strengths; replication itself is just one part of the clustering solution, which for the most part is the same regardless of the replication type.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Top-to-bottom data protection&lt;/span&gt; - Simple backup integration is a good start, but the list of needs is far longer:  off-site data storage, automatic data consistency checks, and data repair are on the short list of necessary features.  Most clustering and replication frameworks offer little or nothing in this area even though replica provisioning is often closely tied to backups.    Yet for many users integrated data protection will be the single biggest benefit of the new clustering approach.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Partition management&lt;/span&gt; - In the near future most applications will fit on a single database server, but most organizations have multiple applications while ISPs run many thousands of them.  There need to be ways to assign specific databases to partitions and then allow applications to locate them transparently.  This type of large-scale sharding is the problem that remains when single application databases can run on a single host.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Cloud and virtualized operation&lt;/span&gt; - In the long run virtualization is the simplest cure for hardware utilization problems--far easier and more transparent than other approaches.   A large number of applications now run on virtual machines at ISPs or in cloud environments like Amazon for this reason.  To operate in virtual environments, database clusters must be software only, have simple installation, and make very minimal assumptions about resources.  Also, they need to support seamless database provisioning to as capacity needs rise and fall, for example adding new VMs or provisioning an existing 4 core VM to a larger 8-core VM with more memory as demand shifts.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Transparent application access&lt;/span&gt; - Applications need to be able to connect to clusters seamlessly using accustomed APIs and without SQL changes.  This is actually easier to do on databases that use simple master/slave or disk block methods rather than more complex clustering implementations.  (Case in point: porting existing applications to &lt;a href="http://www.mysql.com/products/database/cluster/"&gt;MySQL Cluster&lt;/a&gt;.)   Also, the application access needs to be able to handle simple performance-based routing, such as directing reports or backups to a replica database. The performance scaling that most users now need is just not that complicated.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Open source&lt;/span&gt; - For a variety of reasons closed approaches to clustering are doomed to insignificance in the open source database markets.  The base clustering components have to be open source as some of them will depend on extensions of existing open source technology down to the level of storage and database log changes.  You also need the feedback loops and distribution that open source provides to create mass-market solutions.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;What I have just described is exactly what we are building with Tungsten.  Tungsten is aimed at the increasingly large number of applications that can run on a single database.  We can help with database performance too, of course, but we recognize that over time other issues will loom larger for most users.  The technical properties described above are tractable to implement and we have a number of them already with more on the way in the near future.  Master/slave clustering is not just feasible--it works, and works well for a wide range of users.&lt;br /&gt;&lt;br /&gt;Still, I don't want anyone to mistake my point.  There are many applications for which performance is a very serious problem or whose other needs cannot possibly be met by off-the-shelf software.  Facebook and other large sites will continue to use massive, custom-built MySQL clusters as well as non-SQL approaches that push the state of the art for scaling and availability.  Analytics and reporting will continue to require ever larger databases with parallel query and automatic partitioning of data as &lt;a href="http://www.asterdata.com/"&gt;Aster&lt;/a&gt; and &lt;a href="http://www.greenplum.com/"&gt;GreenPlum&lt;/a&gt; do.  There are specialized applications like Telco provisioning that really do require a tightly coupled cluster and where it's worth the effort to rewrite the application so it works well in such an environment.  These are all special cases at the high end of the market.&lt;br /&gt;&lt;br /&gt;Mainstream users need something that's a lot simpler and frankly more practical to deliver as an off-the-shelf cluster.   Given the choice between combining a number of technologies like MMM, backups of various flavors, cron jobs, Maatkit, etc., a lot of people are just going to choose something that pops up and works.   The hardware capability shift and corresponding database improvements are tilting the field to clustering solutions like Tungsten that are practical to implement, cover the real needs of users, and are fully integrated.  I'm betting that for a sizable number of users this is the future of database clustering.&lt;br /&gt;&lt;br /&gt;p.s., We have had a long summer of work on Tungsten, which is why this blog has not been as active as in some previous months.  We are working on getting a &lt;span style="font-style: italic;"&gt;full clustering solution out in open source&lt;/span&gt; during the week of September 7th.  For more information check out full documentation of open source and commercial products &lt;a href="http://www.continuent.com/downloads/documentation"&gt;here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-5502983661631528636?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/5502983661631528636/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=5502983661631528636' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/5502983661631528636'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/5502983661631528636'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2009/09/future-of-database-clustering.html' title='The Future of Database Clustering'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-6376360328229522298</id><published>2009-08-16T00:14:00.000-07:00</published><updated>2009-08-16T00:15:28.935-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='PostgreSQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Drizzle'/><title type='text'>Tungsten Welcomes Your Contributions!</title><content type='html'>&lt;a href="http://www.continuent.com/community"&gt;Tungsten clustering and replication&lt;/a&gt; has been accessible as open source for almost a year, but it has taken us an amazingly long time to get our contribution policy set up.  The dithering ended promptly after Monty Widenius wrote an excellent blog article on &lt;a href="http://monty-says.blogspot.com/2009/08/thoughts-about-dual-licensing-open.html"&gt;dual-licensed software&lt;/a&gt; from his experiences at &lt;a href="http://askmonty.org/wiki/index.php/Main_Page"&gt;AskMonty.org&lt;/a&gt; and previously at MySQL AB.  One of the things I especially like is Monty's emphasis on contributor rights.  Contributor rights create the sense of reciprocity that makes open source function effectively as a development model.  Tungsten is henceforth adopting the AskMonty.org contribution model. &lt;br /&gt;&lt;br /&gt;So, if you want to contribute code to Tungsten (I'll describe shortly why you might to do that), you first fill out our handy &lt;a href="http://www.continuent.com/community/licensing/code-contributor-agreement"&gt;Code Contributor Agreement&lt;/a&gt; and send it to us.  The CCA says that you grant us rights to use the code within Tungsten as if we had written it ourselves, which includes selling it in licensed versions of Tungsten.  At the same time, you retain your rights to the code and can also use it for any purpose you please including donating it to other projects, selling it, licensing it commercially, etc.   It is really a very simple agreement.  We also plan to match further protections to contributors as Monty.org adopts them. &lt;br /&gt;&lt;br /&gt;After you send us the CCA, you can send us patches.  Why would you want to do so?  Recall that Tungsten allows you to create database clusters that protect you from data loss, failover quickly to replicas, and scale performance by spreading work around multiple data copies.  Here are just a few ways you can help Tungsten do that better:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;New types of backups.  Anybody want to add built-in backups using InnoDB Hot Backup?  How about storing backup files on Amazon S3?  Note:  The second project is very high up my personal list so you'll need to hustle.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Replication event filtering.  Our forums see regular discussions about filtering (&lt;a href="http://www.continuent.com/forum?func=view&amp;amp;id=242&amp;amp;catid=2"&gt;like this one&lt;/a&gt;).  What about a handy filter to remove specific databases, tables, or columns when moving data from masters to slaves?&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Support for new databases.  Anybody need to replicate from MySQL to Amazon SimpleDB?   How about Drizzle or PostgreSQL?&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Sharding.  Anybody need to connect transparently to databases spread across multiple shards?  You can do it by extending SQL Router load balancing.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Fixing things that are broken. &lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;We will accept any patch that provides beneficial improvements to Tungsten and will make an effort to integrate it quickly.  In some cases we may have to rewrite them, which will of course delay integration.  The more you work with us the faster that integration will occur. &lt;br /&gt;&lt;br /&gt;Speaking of patches, we got an initial contribution to implement replication from MySQL into Drizzle earlier in the summer.  Now that we have all the licensing paperwork in order it's time to get that one properly integrated.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-6376360328229522298?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/6376360328229522298/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=6376360328229522298' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/6376360328229522298'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/6376360328229522298'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2009/08/tungsten-welcomes-your-contributions.html' title='Tungsten Welcomes Your Contributions!'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-118166934777226715</id><published>2009-08-09T14:15:00.000-07:00</published><updated>2009-08-09T15:50:17.034-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='PostgreSQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Drizzle'/><category scheme='http://www.blogger.com/atom/ns#' term='IT Industry'/><title type='text'>Building the Open Source Hackers Cooperative</title><content type='html'>It is striking how much harder it is to make money from open source than to write it in the first place.  Open source development is a sophisticated and well-understood social activity.  However, the economic model is often laughably primitive:  "if you build it, they will come."  That applies to the question of turning your open source project into a real job.  More interestingly, it applies to the question of how to make open source projects as valuable as possible to the largest number of people.  In this post I would like to propose an answer to both questions. &lt;br /&gt;&lt;br /&gt;To illustrate open source sophistication, just look how easy it has become to start and manage projects.  It is almost a cookie-cutter procedure.  You pick one of a number of well known licenses, manage the code on &lt;a href="http://sourceforge.net/"&gt;SourceForge.net&lt;/a&gt; or &lt;a href="http://launchpad.net/"&gt;Launchpad&lt;/a&gt;, communicate with the project through skype and mailing lists, and tell the world about it using your blog plus &lt;a href="http://twitter.com/"&gt;Twitter&lt;/a&gt;. Within an afternoon you have set up infrastructure to support efficient collaborative development with team members from Seattle to Singapore.   The number of projects itself makes the point:  SourceForge.net alone has &lt;a href="https://sourceforge.net/apps/trac/sourceforge/wiki/What%20is%20SourceForge.net?"&gt;over 230,000 projects&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Things get much harder if you want to make a decent living from a successful project.  It’s not that there is a lack of models to choose. You could form a commercial venture that offers closed source/licensed versions of your open source project.   However, as many of us have seen with MySQL AB and other companies, commercial efforts have a tendency to conflict badly with broader community interests or those of other companies that might be useful partners.  For this reason, they don’t even necessarily maximize the value of the original commercial effort.&lt;br /&gt;&lt;br /&gt;You could of course try to create a foundation like Apache or Eclipse.  However, these generally require established software and large commercial backers.  The current experience of the fledging &lt;a href="http://opendatabasealliance.com/"&gt;Open Database Alliance&lt;/a&gt; and other efforts shows that it can be quite complex to create such organizations from scratch even with well-established code and well-known players. This is not a model that can be repeated across a wide number of different projects large and small.&lt;br /&gt;&lt;br /&gt;Finally, you could sell consulting and support for your project.  The problem with the consulting model is that it is not scalable—in order to make a decent living you have to work on customer problems.  The pro-bono work like extending the project or tending to a community tends to fall by the wayside unless you can get a customer to pay for it.&lt;br /&gt;&lt;br /&gt;The fact that it is hard to make money off open source is a symptom of a larger problem:  we are  losing the wider social benefits that for many people are the real point of open source software.  This is an economic problem:  how do we allow hackers to make a reasonable living on open source projects while maximizing the long-term value of the software to the widest possible number of users?  It turns out there’s a reasonable economic model that can do this:  &lt;a href="http://en.wikipedia.org/wiki/Co-op"&gt;cooperatives&lt;/a&gt;, which are defined as follows in Wikipedia.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;a legal entity owned and democratically controlled [equally] by its members&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Cooperatives have existed for centuries in many different forms and have successfully solved problems ranging from providing student housing to delivering consumer goods like sporting equipment on a grand scale.  We need a new form of cooperative that I propose to call the &lt;span style="font-style: italic;"&gt;Open Source Hackers Cooperative,&lt;/span&gt; or Hackers Co-op for short.  The Hackers Co-op has three different but co-equal types of members:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Hackers&lt;/span&gt;, who are the core committers to the project and stewards of the code&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Sponsors&lt;/span&gt;, who supply funding and labor&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Community members&lt;/span&gt;, who use the software, test it, and contribute patches&lt;/li&gt;&lt;/ul&gt;As we will see, some members are more equal than others, which is why I added brackets in the cooperative definition.&lt;br /&gt;&lt;br /&gt;Here are the organizational principles for an Open Source Hackers Cooperative:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;The Co-op is a non-profit&lt;/span&gt;.  It’s not for sale and there is no exit strategy.   Like all co-ops, it exists to maximize benefits for its members.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Hackers work directly for the Co-op&lt;/span&gt;.  Their time is divided between implementing features that interest them, integrating patches as well as fixing bugs reported from the community, and implementing features for sponsors. &lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Sponsors provide long-term funds and/or labor to the Co-op&lt;/span&gt;.  Sponsors build businesses on the open source software and kick back a percentage of the business value in return for support and new features.  They can also contribute labor for specific co-op tasks.  Sponsors need not be for-profit businesses.  They could in some cases even be governments or NGOs.  The point is that they fund the software based on its value, for example through grants.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Community members provide leverage to the development model&lt;/span&gt;.  They use the software, provide basic support through forums, and contribute patches for bugs and small features. These activities leverage the hackers who can use community patches and feedback to evolve the software.  This is a very efficient development model.  &lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;The Co-op is a democracy.  &lt;/span&gt;Co-op members vote on allocation of hacker resources.  The vote is structured to keep a single group of members from hijacking the entire Co-op by dividing hacker time into a pie with allocations for the interests of each member type.  Sponsors vote for their section of the pie using a vote weighted by their relative funding contributions--sponsors are not equal to encourage competition to commit more funds.  Hackers effectively vote their portion by doing whatever they want with the time they get for personal projects.  Community members vote through surveys or some other reasonable mechanism.  &lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;The Co-op has elected officers.&lt;/span&gt;  There is a chief economist who is in charge of the business model and plans finances, contracts with sponsors, arranges employment contracts, etc.  There is also a chief technologist who ensures project infrastructure and moderates technical discussions.   Co-op officers are elected or at least approved by the members at large.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;The Co-op pays dividends&lt;/span&gt;.  Some open source projects are quite valuable, so a well-run co-op could easily become very profitable.  The excess profits are distributed to hackers in the form of retirement and other benefits,  to sponsors in the form of cash rebates, and to community members in the form of conferences, hiring of new hackers to work on features, improved infrastructure, etc.  &lt;/li&gt;&lt;/ol&gt;You can try to run through a number of scenarios for the hacker co-operative to see how well the model holds together.  All you really need is software with enough intrinsic value that it can sustain an active, technically aware community and where sponsors are motivated to build businesses on it but do not need to own the engineering.   It is helpful to have community members be programmers, but you can also design the software to allow even non-technical users to contribute effectively.&lt;br /&gt;&lt;br /&gt;Such conditions hold for a variety of system software like database management systems, app servers, and communications packages.  With a little thought they could apply broadly to user applications like accounting systems or voting software, which so far are relatively rare in open source.  The Co-op is model is quite stable, because it aligns interests in such a way that everyone does better if they stick together.&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Hackers earn a stable, comfortable living and public recognition for working on software that they enjoy.   “Comfortable” in this context means salary and benefits equivalent to a typical EU country like Germany or Finland, which are the gold standard for employee compensation.  This works economically because hackers are (a) very productive to begin with and (b) become more so by leveraging an active user community.   Hackers are motivated to produce because the more viable the software is, the better the co-op does, and the more benefits they receive. &lt;/li&gt;&lt;li&gt;Sponsors get features that they need using a productive open source hacking development model.  This is a replacement for models like trying to take the software private and farming it out to low-cost offshore locations, which experience has shown to be badly broken on a number of levels.  More important, sponsors get stability in the sense that the software cannot be taken over by hostile corporate interests and is supported over the long term, which lowers the risk of building businesses on it.  They are motivated to contribute more in order to vote more resources for tasks that help their businesses. &lt;br /&gt;&lt;/li&gt;&lt;li&gt;Community members get software that is continuously supported and evolving rapidly to add features they need.  They get assurance that valuable patches will be integrated.   They are motivated to use, test, and develop patches for the software as that further increases its value to them and leads to recognition as authorities on the project.   &lt;/li&gt;&lt;/ol&gt;The Open Source Hackers Cooperative can be structured to create a number of virtuous feedback loops that will support and extend the software over a long period of time.  The Hackers Co-op model could be standardized and even backed up with software as well as cookie-cutter legal documents so that it becomes very simple to set up and manage.&lt;br /&gt;&lt;br /&gt;I don't know if there are projects already following this model.   &lt;a href="http://www.chesnok.com/daily/2009/04/25/the-future-of-free-and-open-source-support-models/"&gt;Selena Deckelman already used the term "Hacker's Cooperative"&lt;/a&gt; though from a somewhat different perspective.   If you know of anyone who has worked through the cooperative economics fully I would appreciate hearing from you.  Meanwhile, this is not a theoretical question for at least a couple of reasons.&lt;br /&gt;&lt;br /&gt;First, I work for a for-profit company that is willing to sponsor projects in exactly the way described in this article.  We are looking for (a) control, (b) stability, and (c) a development model that is cheaper than doing it ourselves.   If such cooperatives existed we would be interested in them.  I'm sure we are not alone, because this is how all for-profit businesses tend to think.  The cooperative is a viable model because of the way it reinforces and maximizes member interests.&lt;br /&gt;&lt;br /&gt;Second, the Hacker Co-op leads to some interesting questions about how to design software that promotes the leveraging effect of an open source community.  It is possible to design even application software so the your users don't all have to be coders in order to contribute.   If you want to create something really big out of your open source project (or just have a nice job) it helps to think these issues through before you write much code.  Maybe it's the first thing on the list after you set up your new project.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-118166934777226715?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/118166934777226715/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=118166934777226715' title='19 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/118166934777226715'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/118166934777226715'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2009/08/building-open-source-hackers.html' title='Building the Open Source Hackers Cooperative'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>19</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-5521540064976781845</id><published>2009-07-20T23:38:00.000-07:00</published><updated>2009-07-20T23:41:17.715-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><title type='text'>Backups, Backups, Backups (and Restore)</title><content type='html'>&lt;a href="http://www.continuent.com/community/tungsten-replicator"&gt;Tungsten Replicator&lt;/a&gt; has built-in backup and restore for MySQL!   I checked in the final touches over the weekend. Here's how to run a backup on a database and store it so you can restore it later.  If you leave off options, we use the default back-up procedure and storage that you select when setting up replication.&lt;br /&gt;&lt;pre&gt;trepctl backup [-backup agent] [-storage agent] [-limit timeout]&lt;br /&gt;&lt;/pre&gt;And here's how to restore.  If you leave off the options, we find the latest backup in your default storage and load that.&lt;br /&gt;&lt;pre&gt;trepctl restore [-uri backup_uri] [-limit timeout]&lt;br /&gt;&lt;/pre&gt;That's the syntax.  Now here's what happens behind the scenes.  First Tungsten Replicator has a new BackupAgent plug-in that implements backup procedures. We have a backup agent for each the following types of backups for MySQL:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Mysqldump - Probably least useful but easy to set up.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;LVM snapshot to tar.gz - Scalable with miminal database down time.  Features are similar to Lenz Grimmer's excellent &lt;a href="http://www.lenzg.net/mylvmbackup/"&gt;mylvmbackup&lt;/a&gt; script.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Script dump - Integrate your own script for backup and restore.  The script has to follow some very simple conventions, for which there is an example.  You can integrate practically any backup/restore package this way.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Speaking as an unbiased user of the system I love the LVM snapshots. LVM snapshot is overall the most convenient way to do backups for a wide range of databases, not just MySQL, though I have to admit I have not used either &lt;a href="http://www.innodb.com/products/hot-backup/"&gt;InnoDB Hot Backup&lt;/a&gt; or &lt;a href="http://www.percona.com/docs/wiki/percona-xtrabackup:start"&gt;Percona's XtraBackup&lt;/a&gt;.   I guess now would be a good time to try them since we can integrate them through the script dump mechanism.&lt;br /&gt;&lt;br /&gt;Meanwhile, mysqldump is far and away my least favorite mechanism, not least of all because of &lt;a href="http://bugs.mysql.com/bug.php?id=34306"&gt;this unusually heinous bug&lt;/a&gt;, which completely breaks the &lt;code&gt;mysqldump --all-databases&lt;/code&gt; command.  It's still not fixed as of at least MySQL 5.1.34.  (How on earth did this one get in and why didn't it get corrected instantly?)&lt;br /&gt;&lt;br /&gt;Second, there is a new StorageAgent plug-in to handle storing and retrieving backup files.  There is one of these for each type of storage.  Currently the choice is limited to shared disk but I expect we'll have an Amazon S3 storage plug-in in the near feature.  That's just too useful to pass up for very long...Among other things we ourselves run all our company services on Amazon and I would like to use it for our own backups.&lt;br /&gt;&lt;br /&gt;If you want to use the new backup capability you can either build Tungsten Replicator yourself using the instructions on our &lt;a href="http://www.continuent.com/community/tungsten-replicator/getting-started"&gt;getting started page&lt;/a&gt; or wait until the next binary build comes out in a couple of weeks.   Backup and restore are documented &lt;a href="http://tungsten.sourceforge.net/docs/Tungsten-Replicator-Guide/Tungsten-Replicator-Guide.html#basic.backups"&gt;here&lt;/a&gt; in the Tungsten documentation.&lt;br /&gt;&lt;br /&gt;We went through a lot of effort to make the backup and restore processes as simple as possible.  It's down to one keyword for each operation, so I don't think it's going to get much simpler.  Please try it out and provide your feedback.  I love bug reports and want to hear what you think.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-5521540064976781845?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/5521540064976781845/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=5521540064976781845' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/5521540064976781845'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/5521540064976781845'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2009/07/backups-backups-backups-and-restore.html' title='Backups, Backups, Backups (and Restore)'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-5151745197465987146</id><published>2009-06-20T10:20:00.000-07:00</published><updated>2009-06-20T10:27:34.335-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='PostgreSQL'/><title type='text'>When SANs Go Bad</title><content type='html'>They sometimes go bad in completely unpredictable ways.  Here's a problem I have now seen twice in production situations.  A host boots up nicely and mounts file systems from the SAN.  At some point a SAN switch (e.g., through a Fibrechannel controller) fails in such a way that the SAN goes away but the file system still appears visible to applications.&lt;br /&gt;&lt;br /&gt;This kind of problem is an example of a &lt;a href="http://en.wikipedia.org/wiki/Byzantine_Fault_Tolerance"&gt;Byzantine fault&lt;/a&gt; where a system does not fail cleanly but instead starts to behave in a completely arbitrary manner.  It seems that you can get into a state where the in-memory representation of the file system inodes is intact but the underlying storage is non-responsive.  The non-responsive file system in turn can make operating system processes go a little crazy.  They continue to operate but show bizarre failures or hang.  The result is problems that may not be diagnosed or even detected for hours.&lt;br /&gt;&lt;br /&gt;What to do about this type of failure?   Here are some ideas.&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Be careful what you put on the SAN&lt;/span&gt;.  Log files and other local data should not go onto the SAN.  Use local files with syslog instead.  Think about it:  your application is sick and trying to write a log message to tell you about it on a non-responsive file system.  In fact, if you have a robust scale-out architecture, don't use a SAN at all.   Use database replication and/or DRBD instead to protect your data.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Test the SAN configuration carefully, especially failover scenarios&lt;/span&gt;.  What happens when the host fails from access one path to another?  What happens when another host picks up the LUN from a "failed" host?  Do you have fencing properly enabled?&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Actively look for SAN failures&lt;/span&gt;.  Write test files to each mounted file system and read them back as part of your regular monitoring.   That way you know that the file system is fully "live."&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;The last idea gets at a core issue with SAN failures--they are rare, so it's not the first thing people  think of when there is a problem.  The first time this happened on one of my systems it was around 4am in the morning.   It took a really long time to figure out what was going on.  We didn't exactly feel like geniuses when we finally checked the file system.&lt;br /&gt;&lt;br /&gt;SANs are great technology, but there is an increasingly large "literature" of SAN failures on the net, such as &lt;a href="http://arjen-lentz.livejournal.com/132978.html"&gt;this overview from Arjen Lentz&lt;/a&gt; and &lt;a href="http://communities.vmware.com/thread/195806"&gt;this example of a typical failure.&lt;/a&gt;   You need to design mission-critical systems with SAN failures in mind.  Otherwise you may want to consider avoiding SAN use entirely.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-5151745197465987146?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/5151745197465987146/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=5151745197465987146' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/5151745197465987146'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/5151745197465987146'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2009/06/when-sans-go-bad.html' title='When SANs Go Bad'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-5007524964347009380</id><published>2009-06-17T17:11:00.000-07:00</published><updated>2009-06-18T06:01:38.372-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='PostgreSQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Proxies'/><title type='text'>Lots of New Tungsten Builds--Get 'Em While They're Hot</title><content type='html'>There is a raft of new &lt;a href="https://sourceforge.net/project/showfiles.php?group_id=256125"&gt;Tungsten open source builds&lt;/a&gt; available for your replication and clustering pleasure.  Over the last couple of days we uploaded new binary builds for &lt;a href="http://www.continuent.com/community/tungsten-replicator"&gt;Tungsten Replicator&lt;/a&gt;, &lt;a href="http://www.continuent.com/community/tungsten-connector"&gt;Tungsten Connector&lt;/a&gt;, &lt;a href="http://www.continuent.com/community/tungsten-monitor"&gt;Tungsten Monitor&lt;/a&gt;, and &lt;a href="http://www.continuent.com/community/tungsten-sql-router"&gt;Tungsten SQL Router&lt;/a&gt;.   These contain the features described in &lt;a href="http://scale-out-blog.blogspot.com/2009/06/tungsten-development-news-lots-of-new.html"&gt;my previous blog article&lt;/a&gt;, including even more bug fixes (&lt;a href="http://forge.continuent.org/jira/secure/ReleaseNote.jspa?projectId=10110&amp;amp;styleName=Html&amp;amp;version=10295"&gt;36 on Tungsten Replicator alone&lt;/a&gt;) than I had expected as we had a debugging fest over the last few days that knocked off a bunch of issues.  You can pick up the builds on the &lt;a href="https://sourceforge.net/project/showfiles.php?group_id=256125"&gt;Tungsten download page&lt;/a&gt;.   Docs are posted on the &lt;a href="http://sourceforge.net/apps/mediawiki/tungsten/index.php?title=Main_Page"&gt;Tungsten wiki&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;If you have questions, see problems with the builds, or just want to tell us how great they are, please post on the &lt;a href="http://www.continuent.com/community/forum"&gt;community forums&lt;/a&gt; or on the &lt;a href="https://sourceforge.net/mailarchive/forum.php?forum_name=tungsten-discuss"&gt;tungsten-discuss mailing list&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Our next open source release will be the Tungsten Manager, which is long overdue to join the family of regular builds.  We are doing some polishing work on the state machine processing and group communications, after which the Manager will go out along with documentation on how to use it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-5007524964347009380?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/5007524964347009380/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=5007524964347009380' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/5007524964347009380'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/5007524964347009380'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2009/06/lots-of-new-tungsten-builds-get-em.html' title='Lots of New Tungsten Builds--Get &apos;Em While They&apos;re Hot'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-7495889622209955434</id><published>2009-06-10T12:48:00.000-07:00</published><updated>2009-06-10T12:50:50.896-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Replication'/><category scheme='http://www.blogger.com/atom/ns#' term='PostgreSQL'/><title type='text'>Tungsten Development News - Lots of New Features!</title><content type='html'>Articles on this blog have been pretty scanty of late for a simple reason--we have been 100% heads-down in &lt;a href="http://www.continuent.com/community"&gt;Tungsten&lt;/a&gt; code since the &lt;a href="http://www.mysqlconf.com/mysql2009"&gt;recent MySQL Conference&lt;/a&gt;.   The result has been a number of excellent improvements that are already in Subversion and will appear as open source builds over the next couple of weeks.&lt;br /&gt;&lt;br /&gt;Tungsten has a simple goal:  create highly available, performant database clusters using unaltered commodity databases that are simple to manage and look as close to a single database as possible for applications.  Over the last two months we completed the integration of individual Tungsten components necessary to make this happen. &lt;br /&gt;&lt;br /&gt;Full integration is a big step forward and finally gets us to the ease-of-use we were seeking.  Imagine you want to add a slave database to the cluster.  There's no management procedure any more--you just turn it on.  Managers in the cluster automatically detect the new slave and add it as a data source.  That's the way we want every component to work from top to bottom--either on or off, end of story.  It was really nice to see it start to work a few weeks ago. &lt;br /&gt;&lt;br /&gt;We are now ready to start pushing builds out to the &lt;a href="https://sourceforge.net/projects/tungsten"&gt;Tungsten SourceForge.net project&lt;/a&gt;.  Here is a selection of the features: &lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.continuent.com/community/tungsten-replicator"&gt;Tungsten Replicator&lt;/a&gt; -- API support for seamless failover, certification on Solaris, better Windows support, testing against &lt;a href="http://askmonty.org/wiki/index.php/MariaDB"&gt;MariaDB&lt;/a&gt;, and many other improvements like flush events for seamless failover.  There are already 26 fixes in &lt;a href="http://forge.continuent.org/jira/browse/TREP?report=com.atlassian.jira.plugin.system.project:roadmap-panel"&gt;JIRA&lt;/a&gt; and I expect more before we post the build.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.continuent.com/community/tungsten-sql-router"&gt;Tungsten SQL Router&lt;/a&gt; -- Pluggable load balancing with session consistency support.  Session consistency means users see their own writes but can read changes by other users from a slave.  It works using a single database connection, which is an important step toward eliminating application changes in order to scale on master/slave clusters.  &lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.continuent.com/community/tungsten-manager"&gt;Tungsten Manager&lt;/a&gt; -- Directory-based management model that allows you to view and manage both JMX-enabled services as well as regular operating system processes that follow the familiar LSB pattern of 'service name start/stop/restart'.   The managers use group communications and can broadcast commands across multiple hosts, handle failures, and automatically detect new services as they come online. &lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.continuent.com/community/tungsten-monitor"&gt;Tungsten Monitor&lt;/a&gt; -- Improved monitoring of replicator status including slave latency, which is necessary to guide SQL Router load balancing features like session consistency.&lt;br /&gt;&lt;br /&gt;There's a lot going on with Tungsten right now, in fact far too many things to mention even in a longish post like this one.  One of my current code projects is to implement &lt;a href="http://forge.continuent.org/jira/browse/TREP-278"&gt;built-in backup and restore&lt;/a&gt; for Tungsten Replicator.  I am planning on supporting slave auto-provisioning:  a new slave comes up, restores the latest backup, and starts replicating.   All you have to do is turn the slave on.   (More of that on/off stuff--it's kind of an obsession for us at this point.)&lt;br /&gt;&lt;br /&gt;Integrating backup/restore is the final big feature for Tungsten Replicator 1.0--after this we plan to turn attention to parallel replication and are already discussing how this might work with several potential customers.  Feel free to contact me through this blog or better yet post on the &lt;a href="http://www.continuent.com/community/forum?func=view&amp;amp;id=246&amp;amp;catid=2"&gt;community forums parallel replication topic&lt;/a&gt; to join the conversation. &lt;br /&gt;&lt;br /&gt;One final bit of news, we are starting to work seriously on Tungsten PostgreSQL integration thanks to a &lt;a href="http://www.continuent.com/index.php?option=com_content&amp;amp;task=view&amp;amp;id=926&amp;amp;Itemid=27"&gt;new partnership between Continuent and 2nd Quadrant&lt;/a&gt;.  This work is commercially focused for now but will lead to additional open source features in the not too distant future.  Keep watching this space... :)&lt;br /&gt;&lt;br /&gt;p.s., We also had a nice refit on the &lt;a href="http://www.continuent.com/community"&gt;community website&lt;/a&gt;.  Check it out.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/768233104244702633-7495889622209955434?l=scale-out-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://scale-out-blog.blogspot.com/feeds/7495889622209955434/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=768233104244702633&amp;postID=7495889622209955434' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/7495889622209955434'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/768233104244702633/posts/default/7495889622209955434'/><link rel='alternate' type='text/html' href='http://scale-out-blog.blogspot.com/2009/06/tungsten-development-news-lots-of-new.html' title='Tungsten Development News - Lots of New Features!'/><author><name>Robert Hodges</name><uri>http://www.blogger.com/profile/05379726998057344092</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-768233104244702633.post-1973841458213402649</id><published>2009-05-13T18:29:00.000-07:00</published><updated>2009-05-13T18:31:39.706-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><title type='text'>Continuent is Joining the Open Database Alliance</title><content type='html'>Maybe it's a sense of shared adversity, but recent MySQL meetings have had this "we're all in it together" feeling.  Today &lt;a href="http://monty-says.blogspot.com/2009/05/open-database-alliance-founded.html"&gt;Monty Widenius &lt;span style="text-decoration: underline;"&gt;announced&lt;/span&gt;&lt;/a&gt;&lt;a href="http://monty-says.blogspot.com/2009/05/open-database-alliance-founded.html"&gt; the Open Database Alliance&lt;/a&gt;:  the community feeling is starting to look like a real business entity.  &lt;br /&gt;&lt;br /&gt;The Open Database Alliance is appealing at multiple levels.  First, it's good for the companies that join--a steadier flow of business and ability to offer bigger solutions by combining with partners.  Second, it's good for users:  first rate software, services, and support without vendor lock-in.  Third, the parties are going to be excellent. &lt;br /&gt;&lt;br /&gt;Sometimes you have to think hard before signing up for partnerships.  But this one looks like a no-brainer.  Count us in!&lt;br /&gt;&lt;br /&gt;p.s., Stay tuned for &lt;a href="http://www.continuent.com/community"&gt;Tungsten&lt;/a&gt; certification against &lt;a href="http://askmonty.org/wiki/index.php/MariaDB"&gt;MariaDB&lt;/a&gt;.  If you have tried the Tungsten Replicator already wi
