The PostgreSQL community is getting really serious about replication. On Thursday May 29th, Tom Lane issued a manifesto concerning database replication on behalf of the PostgreSQL core team to the pgsql-hackers mailing list. Tom's post basically said that lack of easy-to-use, built-in replication is a significant obstacle to wider adoption of PostgreSQL and proposed a technical solution based on log shipping, which is already a well-developed and useful feature.
What was the reaction? The post generated close to 140 responses within the next two days, with a large percentage of the community weighing in. It's one of the most significant announcements on the list in recent history. There is pent up demand for this feature and within a few hours people were already deep into the details of the implementation.
The basic idea comes from an excellent presentation by Takahiro Itagaki and Masao Fujii of NTT at PGCon 2008 in Ottawa. They have developed a system that replicates database log records synchronously to a standby database. The standby can recover quickly and without data loss, which makes it a good availability solution. The core team manifesto proposes to integrate this into the PostgreSQL core and add the ability to open the standby for reads.
So, is this the end of the story on replication? I don't think so. There's no question that synchronous log shipping with reads would be a great feature. Basic availability is the first problem users run into when setting up production systems and this feature looks considerably better than alternatives for other databases like MySQL. It will help if NTT donates their code to the community, but still the whole effort will take considerable time. Adding the ability to open a standby for reads is at least a version out (read: up to 2 years).
More importantly, log shipping is most useful for availability. It does not help you replicate across database versions (nice for upgrades), between different databases, from a master to large numbers of slaves, or bi-directionally between databases. Finally, it's a less than ideal solution for clustering data between sites, something that is rapidly becoming one of the most important overall uses of replication. For these and other cases you need logical replication, which turns log records into SQL statements and applies them using a client.
I'm therefore starting an effort to get logical replication hooks included as a parallel effort. If you are interested in this let me know. Meanwhile, stay tuned. Tom's message represents a real change of heart for the PostgreSQL community. Accepting the important of replication opens up the doors for a new round of innovation in scale-out based on PostgreSQL. It could not come at a better time.
Trove and OpenStack
15 hours ago