Thursday, October 30, 2008

Why Is Solaris Missing the Party?

I just spent several hours in a fruitless quest to figure out if there's a way to run Solaris 10 on Amazon. Fruitless is the right word because "real" Solaris operating systems do not seem to be supported other than through QEMU emulation, which looks a bit shaky. So far there's only OpenSolaris.

Why is this a problem? Our company, Continuent, is moving at full speed onto Amazon S3 and EC2. We have a virtual organization with developers spread out from California to Lithuania. Amazon solves a really fundamental problem for us. We can have development machines that everyone can reach easily and activate or deactivate at will. Scott in Santa Cruz does not have to call Seppo in Helsinki just to get a host rebooted. (This is where globalization starts to go really bad.) We are also developing software like Tungsten Replicator that needs to run in cloud environments. Being on Amazon makes sense at multiple levels.

The fly in the ointment is that many of our customers use Solaris 9 and 10. The OpenSolaris instances on Amazon are essentially useless. OpenSolaris is so different from production Solaris that tests give little or no useful information. I have never heard of a customer deploying on it. So we are stuck on the old model of keeping machines in house. As a result Linux and even Windows look cheaper and involve far less hassle as development platforms.

It feels as if Sun is really missing the boat on this one. If I were working for the Solaris team, getting Solaris for Intel available on Amazon would be at the top of my list. If enough IT people start to make the calculations we are, the future of Solaris is not going to be very bright. That would be a pity both for Sun as well as a lot of users.

Wednesday, October 29, 2008

Simple is Beautiful

Last week I attended an incredibly intense conference in Lalandia, Denmark: Miracle Oracle Open World. According to Mogens Norgaard, the organizer, the conference devotes 80% of the time to intense discussions of Oracle databases and 80% of the time to drinking. During the festivities you get this dim mental image of what it would have been like if Vikings had access to 16-core machines and advanced database software. But I digress.

Anyway, Lalandia is located on just that kind of spare, beautiful coast that clears the mind to look for fundamental truths. And sure enough, a talk by Carel-Jan Engel, nailed one of them: simplicity is the key to availability.

At some level we all understand the idea. The more components you have in a system the more likely it is one or more of them will fail either because of a defect or an administrative error. The trouble is we don't act on our intuitions. Carel-Jan showed the Oracle MAA (Maximum Availability Architecture), which looks like this in the marketing pictures:

MAA is the recommended way to create a highly available system using RAC and Data Guard. And suddenly it hits you--there are a lot of moving parts. In seeking redundancy, the authors of the design have created tremendous complexity and hence opportunities for failures. It's an example of what Jeremiah Wilton once allegedly described as "design for maximum failability." I don't know if Jeremiah really said that but it describes the problem pretty well.

And this was Carel-Jan's point. Availability is not something you just purchase and roll in the door on wheels. You get it by engineering very simple systems that have few points of failure. In the Oracle world it often means buying Oracle SE instead of RAC. And running it on standard hardware linked together with replication. Plus, of course, changing your applications so they work within the limitations of the rest of the system. Want to stay available without losing data? Keep the rate of updates low. Performance overload? Partition data into separate systems. You get the idea.

In short, keep it really simple, like this:


This is simple availability. It's very beautiful. Open source database communities have understood this idea for a long time. My goal is to write software make it work better for them and for Oracle users as well.

Friday, October 17, 2008

Getting Smart about the New World of PostgreSQL Replication

Robert Treat and I had some back and forth emails a few weeks ago about explaining database replication to customers. Replication is totally cool but it is also completely confusing to a lot of people. The basic concepts are not widely understood. Plus PostgreSQL does not help by giving you a wide range of methods, often with poorly documented trade-offs.

Based on our conversation I put together a talk for PG West in Portland called Getting Smart about the New World of PostgreSQL Replication. It explains basic concepts and surveys five replication approaches. Press the title and you can possess the slides yourself.

Robert and I had talked about putting together a joint talk about replication. Consider this a first cut. I'm up for iterating a few times to get a solid tutorial.

Meanwhile at Continuent we should be able to replicate data from Oracle or MySQL into PostgreSQL using Tungsten Replicator within about two weeks or so. I'm waiting for one more check-in to enable writing plug-ins that apply SQL to new databases. Sad to say, reading data out of PostgreSQL is going to take a little longer. Stay tuned...

Scaling Databases Using Commodity Hardware and Shared-Nothing Design