Wednesday, October 29, 2008

Simple is Beautiful

Last week I attended an incredibly intense conference in Lalandia, Denmark: Miracle Oracle Open World. According to Mogens Norgaard, the organizer, the conference devotes 80% of the time to intense discussions of Oracle databases and 80% of the time to drinking. During the festivities you get this dim mental image of what it would have been like if Vikings had access to 16-core machines and advanced database software. But I digress.

Anyway, Lalandia is located on just that kind of spare, beautiful coast that clears the mind to look for fundamental truths. And sure enough, a talk by Carel-Jan Engel, nailed one of them: simplicity is the key to availability.

At some level we all understand the idea. The more components you have in a system the more likely it is one or more of them will fail either because of a defect or an administrative error. The trouble is we don't act on our intuitions. Carel-Jan showed the Oracle MAA (Maximum Availability Architecture), which looks like this in the marketing pictures:

MAA is the recommended way to create a highly available system using RAC and Data Guard. And suddenly it hits you--there are a lot of moving parts. In seeking redundancy, the authors of the design have created tremendous complexity and hence opportunities for failures. It's an example of what Jeremiah Wilton once allegedly described as "design for maximum failability." I don't know if Jeremiah really said that but it describes the problem pretty well.

And this was Carel-Jan's point. Availability is not something you just purchase and roll in the door on wheels. You get it by engineering very simple systems that have few points of failure. In the Oracle world it often means buying Oracle SE instead of RAC. And running it on standard hardware linked together with replication. Plus, of course, changing your applications so they work within the limitations of the rest of the system. Want to stay available without losing data? Keep the rate of updates low. Performance overload? Partition data into separate systems. You get the idea.

In short, keep it really simple, like this:


This is simple availability. It's very beautiful. Open source database communities have understood this idea for a long time. My goal is to write software make it work better for them and for Oracle users as well.

8 comments:

Regina Obe said...

Good point. Coming from a mechanical engineering education I understand this too well. The main thing I learned from system dynamics classes was decouple and define the minimum moving parts that achieves your goal.

When I look at some enterprise system architectures all I can think - software architects seem to be in love with complexity. They should take some real engineering courses :)

Mike S said...

Funny thing is that your diagram looks more like a marketing diagram because it abstracts what the MAA details.

Also, comparing your architecture to MAA is silly... MAA will have totally seamless, almost instant failover on server node/hardware failure with RAC. Dataguard replication takes care of failure of the other components.

If you have low requirements (i.e. not "maximum") for RTO and RPO, your solution works fine. The whole POINT of MAA is to provide the best RTO/RPO.

Robert Hodges said...

@Mike S
The master/slave diagram accurately shows all moving parts and is in fact vastly simpler than RAC + Data Guard. Also, not all RAC failovers are instantaneous. In the event a node drops off the network (e.g., due a hard crash), RAC may not fail over until network timeouts expire. Timeouts are the only way to prove failure in this case--this is a basic property of distributed systems. Until this happens part or all of the cluster may hang, depending which resources the failed node was locking.

More importantly, it's not what happens when there's an expected failure. The question is what happens when the extra components that compose RAC and Data Guard themselves fail or are improperly configured. When you start to count these types of problems and also factor in downtime to deploy the system to begin with the availability numbers look quite different.

1 木匠 said...
This comment has been removed by the author.
1 木匠 said...

You create another layer of complexity: Replication.

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:14672061404704

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:14672061404704
"Replication is a study in complexity."

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:431772600346864169

1 木匠 said...

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:431772600346864169

Thanks,
Charlie

Robert Hodges said...

@Charlie
The articles you cite are consistent with the idea of keeping it simple. Even in this case you need to deal with losing your one and only database. Making copies of data is a standard solution to that problem.

p.s., Try our replication. You'll see it's not very hard to set up.

Niall said...

@Robert

The Oracle VIP is somewhat different to a conventional VIP in that it fails over to the other node, not to carry on servicing requests for connection but purely to give a fast timeout (since the listener on node2 isn't aware of the vip of node1). (This may only be true since 10.2(.03))

@mike

It seems to me that both the continuent and the maa diagram rather gloss over how the data is being replicated - representing replication as a box and log ship/apply as an arrow. I wouldn't say that one is any more abstract than the other. Oh and seamless failover in the event of server failure (say a CPU peak that causes the node to become unresponsive) is rather a marketing dream at least up to 10.2 (haven't run 11 RAC in anger yet, but don't see how it would address the base problem of remotely detecting an in-reality non-responsive node)

Scaling Databases Using Commodity Hardware and Shared-Nothing Design