Sunday, April 1, 2012

Disproving the CAP Theorem

Since the famous conjecture by Eric Brewer and proof by Nancy Lynch et al., CAP has given the world countless learned discussions about distributed systems and many a well-funded start-up.  Yet who truly understands what CAP means?  Even a cursory survey of the blogosphere shows profound disagreement about the meaning of terms like CP, AP, and CA in real systems.  Those who disagree on CAP include some of the most illustrious personages of the database community.

We can therefore state with some confidence that CAP is confusing. Yet this observation itself raises deeper questions.  Is CAP merely confusing?  Or is it the case that as with other initially accepted but now doubtful ideas like the Copernican model, evolution, and continental drift, that CAP is actually not correct?  Thoughtful readers will agree this question has not received anywhere near the level of scientific scrutiny it deserves.

Fortunately for science private citizens like me have been forging ahead without regard to the opinions of so-called experts or even common sense.  My work on CAP relies on two trusted analytic tools of database engineers over the legal drinking age: formal logic and beer.  Given the nature of the problem we should obviously use a minimum of the former and a maximum of the latter.  We have established that CAP is confusing.  To understand why we must now deepen our confusion and study its habits carefully.  Other investigators have used this approach with great success.

Let us begin by translating the terms of CAP into the propositional calculus.  The terms C (consistency), A (availability) and P (partition tolerance) can be used to state the famous "two out of three" of CAP using logical implication as shown below.

(1) A and P => not C
(2) P and C => not A
(3) C and A => not P

So far so good.  We can now dispense briefly with logic and turn to confusion.  It seems there is difficulty distinguishing the difference between CA and CP systems, i.e., that they are therefore equivalent. This is a key insight, which we can express formally as follows:

(4) C and A <=> C and P

which further reduces to

(5) A <=> P

In short our confusion has led us directly to the invaluable result that A and P, hence availability and partition tolerance, are exactly equivalent!  I am sure you share my excitement at the direction this work is taking.  We can now through a trivial substitution of A for P in equation 2 above reveal the following:

(6) A and C => not A
(7) C => (A => not A)

We have just shown that consistency implies that any system that is available is also unavailable simultaneously.  This is an obvious contradiction, which means the vast logical edifice on which CAP relies crumbles like a soggy nacho.  Considering the amount of beer consumed at the average database conference it is surprising nobody thought of this before.

At this point we can now raise the conversation up a level from looking for spare change under the table and comment on the greater meaning of our results in the real world.  Which is the following: Given the way most of us programmers write software it's a wonder CAP is an issue at all. Honestly, I can't even get calendar programs to send invitations to each other across time zones.  I plan to bring the combustible analytic capabilities of logic and beer to bear on the mystery of time at a later date.  For now we can just speculate it is due to a mistaken design based on CAP.

11 comments:

Anonymous said...

A database system which is offline can be perfectly consistent and unchanging.

Robert Young said...

(Ah another Blogger/Firefox user who can't get in line comments to post.)

One is tempted to take this as an April Fool's joke. OTOH, industrial strength RDBMS with 2 Phase Commit have been SuperCAP over many "partitions" for a couple of decades.

Baron said...

In the immortal words a slot machine spoke to me and my wife after we fed it the first dollar we ever spent gambling, and stabbed randomly at buttons whose meanings were unclear to us, and thus won $91 and change in a few seconds, "you're brilliant!"

In the less-immortal words of the computer scientist with whom we had dinner a few months later, and then discovered that he'd programmed the very slot game we played, "yeah, that's a hard one. How much money did you lose on it?"

piccolapatria said...

Nice easter egg ;)

Anyway having:

(C ^ P=>!A) ^ (A ^ P=>!C)

Leads to :

(A=> !C v !P ) ^ (C => !A v !P)

and thus:

A ^ C => (!C v !P) ^ (!A v !P) => !P

So you just need two CAP statements ;)

Robert Hodges said...

@All, thanks for reading! I hope this proof is useful to all of you in your next start-up.

@piccolopatria, Very parsimonious! I didn't see that reduction first time through. When I submit for publication I'll be sure to include it. Reviewers like to see solid mathematics as I'm sure you are aware.

piccolapatria said...

I was just adding some entropy to the post...didn't mean to be boring ;)

Anonymous said...

nice bullshit ;)

You can not do that:
(4) C and A <=> C and P
which further reduces to
(5) A <=> P

becasue when C is False then A can be true and P can be False so (4) statement is true and (5) is not

piccolapatria said...

@anonymous:



the author doesn't enforce 4), but is just stating that the blog post he quotes does.

Robert Hodges said...

@All, Anonymous has discovered an easter egg within the easter egg. Based on this new insight we can adjust the proof as follows.

(5) C => (A <=> P)

In other words if C is true A and P are equivalent. The rest of the proof then proceeds as follows:

(6) C => (A and C => not A)
(7) C => (C => (A=> not A))
(8) C => (A => not A)

So the conclusion is the same (whew!) and we still have soggy nachos. I invite other great minds of the age to submit further thoughts on the matter.

(arg, had to repost to remove typos!)

Andre said...

With
(1) A and P => not C
(2) P and C => not A
(3) C and A => not P

you translate the CAP theorem into
(1) and (2) and (3),

while it should be translated into
(1) or (2) or (3).

Robert Hodges said...

@Andre, I think you are perhaps misreading the notation. Equations 1-3 are axioms as they are stated without proof. They have an "AND" relationship to each other as CAP asserts they are true for all time.

As @Piccolapatria pointed out there are more parsimonious formulations but this part of the "proof" correct as far as I can tell.

Scaling Databases Using Commodity Hardware and Shared-Nothing Design