Wednesday, September 24, 2008

Amazon or Google--Which Is More Interesting?

Brian Aker has a great post about how he finds Amazon more interesting than Google, because they have addicting services but no framework lock-in. I couldn't agree more with his conclusion, though for somewhat different reasons.

The contrast between Amazon and Google has intrigued me for a long time. The fact that Amazon is exposing basic infrastructure to build business systems has enormous advantages if that's what you are building. Google on the other hand has been a lot more oriented toward end users. Their services seem more useful to individual consumers.

I got really interested in Amazon services when SQS first appeared. It was clear somebody understood that service-based systems require messaging for integration as well as workflow processing. With messaging, "safe" storage, availability zones, and rapid setup of virtual machines, you can solve some mighty big problems. I can't see how to do this with on-line spreadsheets and free email.

OK, that's kind of a cheap shot. Still, Google services still don't match Amazon by any stretch of the imagination when it comes to building scalable, general-purpose applications.

There's also an implicit difference between the Google and Amazon approaches. When you write software services for money there's usually a business plan somewhere or you don't do it for very long. Business plans in turn require you make some assumptions about the environments you are using like how much they will cost, what features they have now, and what they will have in the future. It's really important that these assumptions be reasonably stable or you can't make much progress.

Amazon may not be very open about how things work, but at least they are reasonably open about their plans. Now think about how many Google services are marked "BETA."

Google Docs
In fact, with Google I'm even very sure about what the term "beta" means. This is not just a problem for me. It's a problem with how Google interacts with the world that will hurt the company in the long run.

In the end I would be willing to go a bit further than Brian. If you write backend systems of any kind, Amazon is more than just interesting. I would be willing to bet that 20 years from now we will look back and say that Amazon provided the model that made the dubious idea once called utility computing really work.

p.s., Don't get me wrong about Google. I googled all the links for this article, which is written on Blogspot, another really nice end user service. Amazon might be the cat's meow but Google is a verb.

Open Source Databases at Oracle Open World

Open source databases still have a very long way to catch up to Oracle. I was at Oracle Open World touring the exhibits on Tuesday. Just for fun I asked everyone I met whether they used open source databases or saw demand for them in their businesses. The answer almost universally went like this: "No."

One simple reason explains much of the Oracle dominance as well as the inertia of many companies in switching to something else. A huge number of enterprise applications like Siebel or SAP run on Oracle. MySQL and PostgreSQL applications on the other hand are either custom code or belong to an area where open source is truly dominant, such as web site content management. Even when more applications run on open source, most companies will adopt them as supplements to existing systems, not as wholesale replacements. Oracle and other commercial databases will continue to rule enterprises for a very long time.

A lot of the focus in open source database development is on matching capabilities of commercial databases. What many open source users really need is the ability to integrate. That in turn depends on features like heterogeneous replication as well as bulk loading. These are not on the road maps of most open source database projects. However, they will be one of the factors that eventually enables open source to break out into a much bigger arena.

Friday, September 19, 2008

Tungsten Replicator 1.0 Alpha Is Released

The 1.0 Alpha of Tungsten Replicator is out. Actually it's been out since Tuesday but it's been a busy week. Binary downloads are available here.

The Alpha release offers basic statement replication for MySQL 5.0 on Linux, Solaris, MacOSX, and Windows platforms. The setup is very simple, and there are procedures for master failover as well as performing consistency checks. If you work at it, you'll find bugs. That's a promise, not a threat. Please log them in the project JIRA. We gladly accept feature requests, too.

Meanwhile, the beta version is in development. Among other nice features we will offer support for user-written SQL event extractors and appliers, MySQL row replication support, lots of testing, and much more.

Monday, September 15, 2008

Bringing Open Source Replication to the Oracle World

Replication is one of the most useful but also also one of the most arcane database technologies. Every real database has it in some form. Despite ubiquity, replication is complex to use and in the case of commercial databases quite expensive to boot.

We aim to change that. On Tuesday we will be announcing replication support for Oracle. Oracle replication will be based on our open source Tungsten Replicator, which is currently available in an alpha version for MySQL. Our goal is to provide replication that is accessible and usable by a wide range of users, especially those running lower-cost Oracle editions.

It's not a coincidence that we chose to implement MySQL and Oracle replication at the same time. MySQL has revolutionized the simplicity and accessibility of databases in general and replication in particular. For example, MySQL users have created cost-effective read scaling solutions using master/slave replication for years. MySQL replication is not free of problems, but there is no question MySQL AB helped by the community got a lot of the basics really right.

On the other hand, Oracle replication products offer state-of-the-art solutions for availability, heterogeneous replication, application upgrade, and other problems, albeit for high-end users. For example, Oracle Streams and Golden Gate TDM offer very advanced solutions to the problem of data migration with minimal downtime. The big problem with these solutions is not capabilities but administrative complexity and cost.

Our initial cut at merging the two worlds is focused on creating a simple and usable database replication product that handles the following use cases for small to medium Oracle installations:
  • Basic data availability using extra copies of databases locally and off-site
  • Scaling reads using the MySQL read-scaling model
  • Performing zero-downtime upgrades and migrations using database replicas
  • Heterogeneous data migration between Oracle and MySQL as well as PostgreSQL (initially one-way only).
The big technical feature is that replication will work on all editions of Oracle, not just Enterprise Edition. We expect to help Oracle users build economical new systems on the scale-out model as well as off-load existing Oracle servers to avoid having to upgrade to more expensive licensing.

An early adopter version will be available toward the end of the month. The Oracle redo log extractor is commercial but all other capabilities are open source, so you can replicate from MySQL up to Oracle freely. We are now looking for some select users who can really help propel the software forward. If you would like to try out Oracle replication, contact me at Continuent.

Sunday, September 14, 2008

Java Service Wrapper Is *Very* Handy

If you write network services using Java, you should look into the Java Service Wrapper (JSW). The JSW turns Java programs from weak delicate creatures easily killed by an errant Ctrl-C into robust network services that boot up automatically, ignore most signals, and restart automatically following crashes. It's free for open source programs and has very reasonable licensing fees for commercial software.

We use JSW on several of our projects including the Tungsten Replicator and the Tungsten Connector. I just checked in a new project on our Tungsten Commons site with an Ant script that automatically copies the open source versions of JSW into a project directory with a conventional layout including bin and lib directories. Check it out here if you would like an example of how to automate addition of JSW wrappers to your own Java projects.

Thursday, September 11, 2008

MySQL 5.0 to 4.1 "Down-Version" Replication using Tungsten

A couple of months ago Mark Callaghan mentioned it would be very nice to have a replication product that could transfer data from newer to older versions of MySQL. Ever since then I have been interested in trying it with our new Tungsten Replicator. Today I finally got the chance.

I have a couple of Centos5 virtual machines running on my Mac that I have been using to test the latest Tungsten Replicator 0.9.1 build. I happen to have MySQL 5.0.22 (the antiquated version that comes with CentOS5) on one VM. I set up MySQL 4.1.22 on the other CentOS5 VM and tried to make it a slave of the 5.0 server using MySQL replication. The result was the following error message:

080911 15:25:13 [ERROR] Master reported an unrecognized MySQL version. Note that 4.1 slaves can't replicate a 5.0 or newer master.

This message was highly satisfactory. MySQL replication is not supposed to work down-version from 5.0 to 4.1.

Now to try it with the Tungsten Replicator. I followed the Tungsten Replicator manual instructions with the MySQL 5.0 host as master and the MySQL 4.1 host as the slave. It turns out the set-up is identical for both versions, which made this part very fast. I then issued the standard commands to bring up the master:
trepsvc start
trep_ctl.sh configure
trep_ctl.sh goOnline
trep_ctl.sh goMaster
followed by commands to start the slave:
trepsvc start
trep_ctl.sh configure
trep_ctl.sh goOnline
Now it was time to fire up mysql against the master database and enter some data.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 25 to server version: 5.0.22-log

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> create table foobar13 (id int, data varchar(25));
Query OK, 0 rows affected (0.12 sec)

mysql> insert into foobar13 values(1, 'first!!!');
Query OK, 1 row affected (0.00 sec)
However, over on the slave, nothing showed up. OK, I know we have never tested against MySQL 4.1, but what's up? Well, in the slave replicator log the following message appeared:

INFO | jvm 1 | 2008/09/11 23:05:06 | 2008-09-11 23:05:06,536 FATAL tungsten.replicator.NodeManager You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'SCHEMA IF NOT EXISTS tungsten' at line 1

Oops! The replicator tried to issue a CREATE SCHEMA command to create its catalog database. CREATE SCHEMA was only introduced in MySQL 5.0.2. Change this to CREATE DATABASE and run Ant to build and redeploy the code. Restart the slave and check the logs. They look clean this time. Now login to the slave database with mysql and look for the foobar13 table:

Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 20 to server version: 4.1.22-standard-log

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> select * from foobar13;
+------+----------+
| id | data |
+------+----------+
| 1 | first!!! |
+------+----------+


Cool, it worked. Replication from MySQL 5.0 to MySQL 4.1 successfully demonstrated.

We will have a much improved Tungsten Replicator 1.0 alpha build ready in a couple of days that includes this fix and many others. By the way, we are working on getting heterogeneous replication to work as well. I'll have a lot more to say about that in future posts.

Monday, September 1, 2008

Continuent Community Site for Database Scale-Out

Our goal at Continuent is to be the go-to guys for database scale-out. Last Thursday we opened up a new community site for scale-out software at http://community.continuent.com. The site is driven by Joomla and has a number of very nice additions like Fireboard Forums and Mediawikis for each project. The first day or two was a bit bumpy as we nailed down some final issues, but most features are now working. We hope the result will be a nice place to meet other people who are interested in database scale-out and share ideas as well as software.

As you will see when visiting the community site, we have a variety of projects that we collectively call the Tungsten Scale-Out Stack. We have had this idea for a while now that it's not enough to have just one or two singing and dancing products--that's too narrow to solve scale-out problems. Instead you want a set of technologies that combine to create a wide variety of solutions.

Our effort last week included posting initial code for the Tungsten Replicator. We are actively testing, fixing bugs, and adding more features. However, there are a number of other projects on the site. I will talk about them on this blog in the future as each one is quite interesting.

Meanwhile, if you have a project that you might like to post on our site, let me know. We are actively looking for scale-out technology for MySQL, PostgreSQL, and commercial databases like Oracle. Just post on the end of this blog and I will see it.

Scaling Databases Using Commodity Hardware and Shared-Nothing Design