Tuesday, January 11, 2011

Fixing Replication with Replication

A couple of days ago I ran into a Tungsten Replicator case where several MySQL tables became corrupted on slaves and needed to be restored from the master.   We identified the tables that had problems fairly quickly using Tungsten Replicator's consistency checks.  However, that led to another problem:  how to restore the slave tables efficiently from the master.  The MySQL server in question processes around 10M tranactions per day--there is virtually no downtime.  Though the tables were not large, we could not be sure whether they were in use. 

Fortunately, you can use a simple MySQL trick to get all the rows of a table to replicate through to slaves.  The idea is to dump the table, delete the rows, then reload it again.  The delete and subsequent reload replicate out to slaves, after which everything is consistent again.  Let's say we have a table called tpcb.history that needs to be fixed.  Login with mysql and run the following commands:
SELECT * FROM tpcb.history 
  INTO OUTFILE '/tmp/tpcb.history.dmp' FOR UPDATE;
DELETE FROM tpcb.history;
LOAD DATA INFILE '/tmp/tpcb.history.dmp' REPLACE
You can do the reload several ways in MySQL, but this particular code has some advantages over other approaches, such as using LOCK TABLES.  First, it uses a transaction, so if something goes wrong the changes roll back and you do not lose your data.  Second, the SELECT ... FOR UPDATE locks your data and ensures serialization.  You can run this while applications are running without problems.

This seems useful enough that I put together a simple script called reload-table.sh with a README and checked them into the Tungsten Replicator codeline on SourgeForge.net.  You can refresh the same table shown above using the following command:
[sudo] ./reload-table-b.sh -u tungsten -p secret -t tpcb.history
I tested the reload using Tungsten 1.3.1 on MySQL 5.1 with statement replication.  However, it would work equally well with row replication.  Moreover, you can do the same trick in MySQL replication, as this involves base replication capabilities that are directly equivalent.  There are a few caveats:  you need to use InnoDB (or another transactional engine), large tables may be a problem, and you would need to tread carefully in cases where tables contain referential constraints.  Finally, it would be wise to save the master table somewhere else before running the script.


Daniƫl van Eeden said...

What about mk-table-sync or REPLACE INTO?

Krishna Chandra Prajapati said...

I agree, when we are dealing with millions of records on per days basis. Below is the best option.

select * into outfile ....
load data infile ...

Similar, scenario discussed in my post.

Robert Hodges said...

REPLACE INTO won't help with statement replication. You'll just reload the same corrupt table on the slave. You actually have to put the rows somewhere and reload them.

Thanks for the reminder on mk-table-sync. One issue on Maatkit is sometimes if you know what's wrong it's simpler to use the SQL directly so you can see what you are doing.

Robert Hodges said...

@Krishna, don't forget the DELETE between loading. It appears that LOAD DATA INFILE does not change anything if you don't do this. You can prove this by doing a SHOW MASTER STATUS to see if anything went into the binlog.

Jonathan said...

I've done that myself once or twice. It works most of the time, but sometimes, if the outfile is too big and the my.cnf file is configured to limit the binlog file by size, it may cause problem.
Other then that, its a pretty fast way to sync replication if you are completely lost about where you are.

swanhart said...

This won't work with row-based-replication if there are rows present on the slave that aren't present on the master.


There are still cases where TRUNCATE can fall back to delete (such as in the presence of FK with InnoDB) and this might still result in inconsistency with RBR.

Justin Swanhart said...

You can also do it with regular replication:

on a good slave:

on the broken slave a few seconds later:

record the relay_master_log_file, and exec_master_log_pos values from the broken slave

on the good slave:

using the values that you recorded on the bad slave, now both slaves are at the same transactional position with respect to the master.

dump the tables on the good slave. restore the tables on the bad slave.

start slave on both servers.


(open id for livejournal is broken now)

Robert Hodges said...

@swanhart Good point on RBR--probably should just do the whole thing with statement replication using 'set session binlog_format=statement'.

Meanwhile, I wanted to use TRUNCATE but was scared off by the confusing description in http://dev.mysql.com/doc/refman/5.1/en/truncate-table.html. The doc indicates at one point that TRUNCATE appears in the log as DROP TABLE/CREATE TABLE but in fact as of 5.1.54 you'll see TRUNCATE in the log. Moreover, it auto-commits--there are BEGIN markers for a TRUNCATE issued in a transaction. That's not good as you'll lose your table as mentioned in the original article above.

At this point it seems you have to look at the binlog to be sure of the semantics.

Slacker said...

@swanhart: TRUNCATE is, without a doubt, faster than DROP TABLE, but it isn't transaction safe, so it isn't appropriate for this kind of thing. See: http://dev.mysql.com/doc/refman/5.1/en/truncate-table.html, specifically "Truncate operations cause an implicit commit."

The commit would stop the transaction and make the operation not atomic.

Scaling Databases Using Commodity Hardware and Shared-Nothing Design