Monday, June 2, 2008

PostgreSQL Gets Religion About Replication

The PostgreSQL community is getting really serious about replication. On Thursday May 29th, Tom Lane issued a manifesto concerning database replication on behalf of the PostgreSQL core team to the pgsql-hackers mailing list. Tom's post basically said that lack of easy-to-use, built-in replication is a significant obstacle to wider adoption of PostgreSQL and proposed a technical solution based on log shipping, which is already a well-developed and useful feature.

What was the reaction? The post generated close to 140 responses within the next two days, with a large percentage of the community weighing in. It's one of the most significant announcements on the list in recent history. There is pent up demand for this feature and within a few hours people were already deep into the details of the implementation.

The basic idea comes from an excellent presentation by Takahiro Itagaki and Masao Fujii of NTT at PGCon 2008 in Ottawa. They have developed a system that replicates database log records synchronously to a standby database. The standby can recover quickly and without data loss, which makes it a good availability solution. The core team manifesto proposes to integrate this into the PostgreSQL core and add the ability to open the standby for reads.

So, is this the end of the story on replication? I don't think so. There's no question that synchronous log shipping with reads would be a great feature. Basic availability is the first problem users run into when setting up production systems and this feature looks considerably better than alternatives for other databases like MySQL. It will help if NTT donates their code to the community, but still the whole effort will take considerable time. Adding the ability to open a standby for reads is at least a version out (read: up to 2 years).

More importantly, log shipping is most useful for availability. It does not help you replicate across database versions (nice for upgrades), between different databases, from a master to large numbers of slaves, or bi-directionally between databases. Finally, it's a less than ideal solution for clustering data between sites, something that is rapidly becoming one of the most important overall uses of replication. For these and other cases you need logical replication, which turns log records into SQL statements and applies them using a client.

I'm therefore starting an effort to get logical replication hooks included as a parallel effort. If you are interested in this let me know. Meanwhile, stay tuned. Tom's message represents a real change of heart for the PostgreSQL community. Accepting the important of replication opens up the doors for a new round of innovation in scale-out based on PostgreSQL. It could not come at a better time.


Mark Callaghan said...

I think this great, but I would describe it as different than standard MySQL replication, not better (limited to 2 nodes, cannot be used on a WAN, ...). And there are things in MySQL that are similar to this -- you can deploy MySQL + DRBD to support failover locally and MySQL replication to support it remotely.

Robert Hodges said...

Definitely agree, though the exact differences are somewhat difficult to understand at this point in the development process. You really need both types of replication.

doswheeler said...

Agreed. Well needed and finally explained.

SpinLock said...

I'm an old Oracle fart, with experience running Oracle Parallel Server on an OpenVMS cluster in a hot-hot or hot-warm configuration.

When I got into PostgreSQL about 7 years ago for a personal web project, I was disappointed to find it had little to offer in the way of replication (save the commercial packages, of course). I chose PSQL over MySQL for reasons such as foreign keys, stored procs, etc., but I sorely missed replication.

Now that it's 2008, the team is discussing poor-man's replication (log shipping), and it's likely two years away? I don't mean this unkindly, but jeepers, it takes a decade to go from no replication to poor-man's? How far out is a hot-hot figuration and a distributed lock manager? 20 years? PostgreSQL is going to miss the boat if they don't get crackin!

Keep in mind Ingres got open-sourced a few years ago, so there's a fully-featured competitor in the mix.

Mark Callaghan said...

Let us hope that PostgreSQL does not design a replication architecture that uses XA to move each transaction from the primary to one slave.

robert.hodges said...

Just a clarification to spinlock--as you are probably aware PostgreSQL already does support log shipping and has for some time. However, it is not possible to open up the secondary for reads. Also, you can lose up to the size of a WAL buffer, which by default is up to 16mb.

Like a lot of people, I don't think the proposal from PostgreSQL is a full solution. We are currently working on database-neutral replication based on SQL statements. We have it working in alpha form for MySQL and plan to post an inital open source version in August. However PostgreSQL support is a ways out.

Edward Kovarski said...

I agree with Spinlock. They need to get moving on features such as this to capture the hearts of the market. Highly available is a necessity these days rather than a nice to have.

p.s. I believe you meant to say "Ottawa" instead of "Ottowa" in the post.

Robert Hodges said...

Thanks for the spelling correction! It's fixed.

PJ said...

SpinLock: PostgreSQL team always felt it was better to leave replication stuff out of the core and have 3rd party projects for that, that's why it took so long. And yes it might take even longer to have full featured thing as the feeling about this hasn't changed.
Also, there are several projects for master-slave replication (e.g. slony), hot standby (e.g. walmgr), warm standby (using PITR), partitioning over different servers (pl/proxy), etc.
The reason for including poor mans replication is to make it easier to use postgres for people who need just simple replication, because things like slony can be hard to install.

Nicholas said...

Huh? Don't we already have slony 1? Why re-invent the wheel?

Scaling Databases Using Commodity Hardware and Shared-Nothing Design