There have been a number of comments that
Amazon RDS does not allow users access to MySQL replication capabilities (for example
here and
here). This is a pity. Replication is one of the great strengths of MySQL and the lack of it is a show-stopper for many users. As of the latest build of
Tungsten Replicator half of this problem is on the way to being solved. You can now set up real-time replication from an external MySQL master into an Amazon RDS instance.
In the remainder of this article I will explain how to set up Tungsten replication to an Amazon RDS slave, then add a few thoughts about why this feature is useful along with some suggestions for improvement. To keep the article reasonably short I assume you understand the basics of installing Tungsten Replicator. If you need more information, check out the
online documentation.
Readying an RDS Test Instance
Amazon RDS is Amazon's on-demand relational database. RDS supports several database types including MySQL, which I will use for this demonstration. Launching a new instance is simple. Login to the
Amazon AWS Console using an account that has RDS enabled, then switch to the
Amazon RDS Console. Press the Launch a DB Instance button, whereupon a screen like the following appears:
|
RDS Database Selection |
Press the Select button for MySQL Community Edition, which starts the configuration window. (You can also replicate into Oracle if you are up for a challenge. If you do this, please post what you did to get it to work!) Next fill out properties for MySQL.
|
MySQL Instance Configuration |
Among other things, you create a master login and password. Note these carefully as you will need them to configure replication. Then continue for another couple of screens, at which point you can launch your instance. It takes about 10 minutes for new instances to spin up, after which you can see the instance properties in the AWS RDS Console. Here's a screen shot of my test instance.
|
AWS RDS Console |
Once the instance is up, test access. This is of course necessary to prove the instance is running properly and that we can login from a remote location. Note the host name in the Endpoint field. This is a DNS entry for the new MySQL instance. Using that and the master login, we can now fire up the mysql client from a remote host where we plan to run replication.
$ mysql -utungsten -p -htest.c4villnbpuq1.us-west-1.rds.amazonaws.com
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2125
Server version: 5.5.27-log Source distribution
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| innodb |
| mysql |
| performance_schema |
| test |
+--------------------+
5 rows in set (0.01 sec)
This looks quite good. We are now ready to install and start replication.
Important note! If you have trouble connecting you may need to tweak your Amazon security group settings to open up ports, especially if you are replicating from non-Amazon locations. Amazon has a very cool feature that can guess your originating host IP and offer a
CIDR address that covers the port range from which you are operating. I used this when configuring my security groups.
Setting Up Tungsten Replication
It is possible to set up replication from a MySQL master directly to Amazon RDS using a single Tungsten Replicator process. However, it is more versatile and simpler to set up two replicators: one to read from the MySQL master, and another replicator to apply to the Amazon RDS slave. I will therefore demonstrate this configuration.
We will assume you have a MySQL master already running and that it meets
prerequisites for running Tungsten. Let's now grab the Tungsten code and install a master replicator. You can get fresh builds from the
Tungsten Replicator builds page.
We will take a recent replicator build that contains the RDS changes, which are documented in
Issue 425. Use 2.0.7 build 177 or later. The main improvement is to add a non-privileged slave mode that avoids invoking any of the operations forbidden by Amazon. Among other things Tungsten normally uses commands like 'SET SESSION SQL_LOG_BIN=0' to suppress writing to the binlog when it applies data on a slave. This command requires SUPER privilege, hence causes problems for underprivileged RDS logins.
Unpack the code and install the master replicator in /opt/continuent. This is no different from installing a normal Tungsten master. My master host is logos1. Here are sample commands to pull the code and set up the master. The example shows the minimum options--if you have MySQL installed in a non-standard location or otherwise differ from a stock installation you may need to add additional options.
mkdir ~/staging
cd staging
wget --no-check-certificate https://s3.amazonaws.com/files.continuent.com/builds/nightly/tungsten-2.0-snapshots/tungsten-replicator-2.0.7-177.tar.gz
tar -xf tungsten-replicator-2.0.7-177.tar.gz
tungsten-replicator-2.0.7-177/tools/tungsten-installer \
--master-slave \
--master-host=logos1 \
--datasource-user=tungsten \
--datasource-password=your_passord \
--service-name=aws \
--home-directory=/opt/continuent \
--cluster-hosts=logos1 \
--start-and-report
Next, set up the slave replicator. For convenience I am going to install the slave on a separate host, named logos2, to avoid port clashes between the two replicators. If you install on the same host, you'll need to install into another release directory location and use the --rmi-port and --thl-port options to avoid port overlaps. Here is the command to set up the Amazon RDS slave. Note that the tungsten-installer program can install code between hosts, which is an extremely useful feature.
tungsten-replicator-2.0.7-177/tools/tungsten-installer \
--master-slave \
--cluster-hosts=logos2 \
--master-host=logos1 \
--datasource-host=test.c4villnbpuq1.us-west-1.rds.amazonaws.com \
--datasource-user=tungsten \
--datasource-password=your_password \
--service-name=aws \
--slave-privileged-updates=false \
--home-directory=/opt/continuent \
--skip-validation-check=InstallerMasterSlaveCheck \
--skip-validation-check=MySQLPermissionsCheck \
--start-and-report
You may see a few warnings during the RDS installation, as the tungsten-installer cannot verify some settings on the Amazon RDS host. These can be ignored. If everything goes well, you now have two replicators up and running. You can check the status of the master and slave using the trepctl command, as in:
/opt/continuent/tungsten/tungsten-replicator/bin/trepctl -host logos1 status
/opt/continuent/tungsten/tungsten-replicator/bin/trepctl -host logos2 status
Both replicators should report that they are online. Now complete the exercise by proving that replication works between the replicators. We start by logging into the local MySQL instance, creating a new table in the test schema, and adding data.
$ mysql -uroot test
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 231488
Server version: 5.5.21-rel25.1-log Percona Server with XtraDB (GPL), Release rel25.1, Revision 234
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> create table foo(id int primary key);
Query OK, 0 rows affected (0.24 sec)
mysql> insert into foo values (256);
Query OK, 1 row affected (0.00 sec)
Now login to the Amazon RDS instance and look for table foo.
$ mysql -utungsten -p -htest.c4villnbpuq1.us-west-1.rds.amazonaws.com test
Enter password:
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2161
Server version: 5.5.27-log Source distribution
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> select * from foo;
+-----+
| id |
+-----+
| 256 |
+-----+
1 row in set (0.01 sec)
Mission accomplished! We have real-time replication enabled from a MySQL master to Amazon RDS. At this point you can replicate more or less normally. There are some obvious limitations due to the fact that Amazon RDS is locked down and does not grant our login full privileges.
- Temp table replication may not work. Tungsten depends on being able to issue commands of the form "set @@session.pseudo_thread_id=23531" and the like. This prevents clashes between temp tables of the same name on different master sessions. You may need to enable row replication on the master, which suppresses temp table replication. (For other approaches, see my previous article on temp tables and the binlog.)
- Any command that requires SUPER privilege will not work. As an obvious example, you will not be able to grant SUPER privilege to new accounts. Such commands will break replication.
- All replicated commands go into the binlog, which is a potential performance drag and may slow down Amazon RDS slaves. Parallel replication may not help in this case, since committing to the binlog is a serialization point that blocks other transactions. This problem may be cured if Amazon picks up group commit fixes from MySQL 5.6 and/or MariaDB.
All things considered, however, these are minor inconveniences. Most applications should be able to replicate without difficulties, especially if the master transaction rate is not too high.
Configuring SSL for Connections to RDS
In the previous demonstration I used a master host running outside Amazon. This means my test transactions traveled across the Internet, where they were visible to all and sundry along the way. To illustrate, we can run tcpdump and watch traffic as it goes by.
$ sudo tcpdump -A -vvv -s 256 host test.c4villnbpuq1.us-west-1.rds.amazonaws.com
...
logos2.46657 > ec2-54-241-56-140.us-west-1.compute.amazonaws.com.mysql: Flags [P.], cksum 0xb0f7 (incorrect -> 0x995d), seq 3012:3073, ack 3412, win 94, options [nop,nop,TS val 89630514 ecr 74223746], length 61
E..qw>@.@......n6.8..A..4..L...f...^.......
.W.2.l..9....insert into foo values (256) /* ___SERVICE___ = [aws] */
22:55:36.316049 IP (tos 0x8, ttl 51, id 26226, offset 0, flags [DF], proto TCP (6), length 63)
If we were handling confidential data, exposing traffic like this to possible evildoers would be a serious problem. Fortunately, Amazon RDS supports SSL encrypted connections from clients. Here is how to use it with Tungsten.
First, you need to get the Amazon RDS certificate, which is used to sign certificates for individual RDS instances.
mkdir /opt/continuent/certs
cd /opt/continuent/certs
wget https://rds.amazonaws.com/doc/mysql-ssl-ca-cert.pem
Next, you need to create a trust store that Java can access containing the certificates of signing authorities whom you trust. For this you will need the
Java keytool utility, which is included in the JDK. If you are just using the Java runtime in production, you will need to generate the store on another host, then copy it over to your test hosts. I used the password "secret" in this example.
$ keytool -import -alias rds -file mysql-ssl-ca-cert.pem -keystore truststore
Enter keystore password:
Re-enter new password:
Owner: CN=aws.amazon.com/rds/, OU=RDS, O=Amazon.com, L=Seattle, ST=Washington, C=US
Issuer: CN=aws.amazon.com/rds/, OU=RDS, O=Amazon.com, L=Seattle, ST=Washington, C=US
...
Trust this certificate? [no]: yes
Certificate was added to keystore
We now need to tell the slave replicator about the truststore file using Java VM options. On the slave host, edit /opt/continuent/tungsten/tungsten-replicator/conf/wrapper.conf and add the extra options shown in bold face.
# Java Additional Parameters
wrapper.java.additional.1=-Dreplicator.home.dir=../../tungsten-replicator/
wrapper.java.additional.2=-Dreplicator.log.dir=../../tungsten-replicator/log
wrapper.java.additional.3=-Dcom.sun.management.jmxremote
wrapper.java.additional.4=-Djavax.net.ssl.trustStore=/opt/continuent/certs/truststore
wrapper.java.additional.5=-Djavax.net.ssl.trustStorePassword=secret
The last step is to enable SSL encryption when applying data. We need to set an extra URL option on the drizzle JDBC driver to turn on SSL. For this we need to edit the static-svc.properties file that configures replication. In my example this file is located in /opt/continuent/tungsten/tungsten-replicator/conf/static-aws.properties. Open the file and look for the section that starts with APPLIERS. Edit the text to add additional urlOptions line as shown below.
############
# APPLIERS #
############
replicator.applier.dbms=com.continuent.tungsten.replicator.applier.MySQLDrizzleApplier
replicator.applier.dbms.host=${replicator.global.db.host}
replicator.applier.dbms.port=${replicator.global.db.port}
replicator.applier.dbms.user=${replicator.global.db.user}
replicator.applier.dbms.password=${replicator.global.db.password}
replicator.applier.dbms.urlOptions=?useSSL=true
replicator.applier.dbms.ignoreSessionVars=autocommit
replicator.applier.dbms.getColumnMetadataFromDB=true
Restart the replicator process (/opt/continuent/tungsten/tungsten-replicator/bin/replicator restart) and you will now be using SSL encryption. If we now look back at the tcpdump outout, it looks like garbage as the following example shows.
...
23:46:52.370904 IP (tos 0x0, ttl 64, id 5899, offset 0, flags [DF], proto TCP (6), length 105)
logos2.44717 > ec2-54-241-56-140.us-west-1.compute.amazonaws.com.mysql: Flags [P.], cksum 0xb0ef (incorrect -> 0xd834), seq 3087:3140, ack 4838, win 102, options [nop,nop,TS val 89938121 ecr 74992771], length 53
E..i..@.@.r....n6.8......zSR.b.....f.......
.\X..xL.....07..|..x.)...T..888.H/...iz...^.W8....'...... ..J
...
This is much better. We are replicating to an Amazon RDS, and the transactions are safe from prying eyes. If you have gotten this far you are ready to try your own applications.
Benefits of Replication into Amazon RDS
Amazon RDS is convenient thanks to its quick and simple setup, but the lack of replication is a severe limitation in building systems that need more than a single MySQL instance. In particular it makes it hard to integrate RDS into systems that consist of more than Amazon RDS itself. Adding the ability to replicate in real-time into RDS therefore has a number of benefits. The most obvious include using RDS to extend existing systems.
First, Amazon RDS can offer a quick way to add read capacity to existing MySQL applications. This is especially useful if you have a cluster, such as Tungsten, which handles your transaction processing and overall HA. You can now add Amazon RDS read slaves that you discard when no longer needed. Tungsten Replicator has a number of other useful features like the ability to read from a group of nodes, not just one, that make such topologies easy to set up and maintain. Clusters other than Tungsten will likewise benefit from this feature.
Second, Amazon RDS is suitable for applications that do not need 24x7 high availability (limitations include slow failover, no online maintenance, no cross-cloud capabilities, etc.) You can now pull data from other sources and send them to Amazon RDS slaves for processing, which amounts to extending overall processing capacity. For example, you could use RDS to run back-office tasks using transactions replicated in from MySQL masters. Tungsten Replicator also replicates data from Oracle, so this is an additional source of transactions.
There are of course other ways to replicate data to and from RDS, for example using batch ETL tools like
Talend. However, these are not real-time and often require application changes to add timestamp columns or otherwise mark transactions that need to be extracted. Log-based replication as implemented by Tungsten is fast and has minimal impact on applications or MySQL itself.
Thoughts about Further Improvements for Amazon RDS Replication
On our side, i.e., at Continuent, we need to do more testing, add documentation, and fix problems as they arise. We are starting a beta test with one of our customers in the next few days, who incidentally was the same customer who requested this feature in the first place after hacking it for themselves. RDS also has some interesting provisioning capabilities that I would like to understand better. We are also adding options that eliminate the need for manual configuration of security settings. This will keep us busy for a while.
Other improvements depend on changes to RDS itself. An obvious and huge improvement would be to permit replication
out of Amazon RDS. Unfortunately, Tungsten needs a login that has REPLICATION_SLAVE privilege so that we can download binlog data. That privilege is not yet available to Amazon users. Once it is, Tungsten extraction will also work in very short order. We actually don't need other commands like START/STOP SLAVE or FLUSH LOGS--just ability to issue a
COM_BINLOG_EVENT from a client connection and receive binlog records. I am sure other products would use this capability as well. (RDS developers, if you are listening here's an easy way to extend your product usability significantly...)
Replication is such a valuable feature of MySQL that Amazon RDS feels somewhat crippled without it. For this reason I would imagine that Amazon will open up additional capabilities in the future. Until then we will polish up replication from MySQL masters to Amazon RDS slaves, awaiting a time when we can add more features.
In the meantime, I hope you will try the new replication to Amazon RDS. As noted in this article you can grab the latest builds and try it yourself. Please report your experiences through either Continuent Support if you are a customer or the
Tungsten discussion list if you use the
open source Tungsten Replicator. I look forward to your feedback and suggestions for making Amazon RDS support better.