NoSQL Zone is brought to you in partnership with:

Developer with experience in a variety of different systems and technologies, with a customer focus and balance with business goals. Particularly interested in backend and large scale systems, and also interested in high level architecture, and API design. Always open to feedback in order to keep learning and improving as a professional. Rodrigo is a DZone MVB and is not an employee of DZone and has posted 39 posts at DZone. You can read more from them at their website. View Full User Profile

CouchDB and Conflict Resolution

09.28.2012
| 4525 views |
  • submit to reddit

Learning CouchDB showed me that it is different than other NoSQL DBs I had studied so far with respect to sharding and replication:

  • It does not have sharding, but rather each server has all the data.
  • It has multiple-master (or active-active) configuration, where you can write to any server
The interesting part about the active-active is how conflicts are resolved. Rather than just showing that there's a conflict, all CouchDB servers have a consistent view of the world, even after conflicts. How does it do it?
  • Revision numbers: first, revision numbers start with the change number to the record. Example: 2-de0ea16f8621cbac506d23a0fbbde08a. Therefore, latest changes to any record win (change 2-de0ea16f8621cbac506d23a0fbbde08a wins over 1-7c971bb974251ae8541b8fe045964219).
  • MD5: after the change number, there is change MD5. In case concurrent changes to the record occur across two different servers, there's an ASCII comparison to determine which one wins (change 2-de0ea16f8621cbac506d23a0fbbde08a wins over 2-7c971bb974251ae8541b8fe045964219). Of course there's no concept of time here to determine which change happened first, but at least each server is able to pick a winning version deterministically. Using a MD5 hash of the document can be quite helpful to avoid replicating the same change (if the same change was made to the different CouchDB databases).
  • Conflict List: besides picking automatically a winning list, any client can know when there were conflicts and do something about - either merging them or picking a different version. That can be done, for instance, using a view that outputs conflict and subscribing to it through the Change API. I understand that this is optional and doesn't stop the system in case of conflicts.

    Conflict Resolution on Couch

For systems where concurrent updates are not common, this is definitely a valid and good approach. Also, there is conflict avoidance through ETAGs (just like Windows Azure Storage), which goes a long way to avoid conflicts.

My concern with CouchDB and active-active is a prolonged split-brain situation, where 2 or more databases are taking writes to the records (potentially the same). Clients can see inconsistent results if they are hitting these different deployments.

Going a bit further on the Split-Brain scenario, if these databases are used by other downstream components in your system, the same conflict resolution problem may occur on their side, as they could be consuming records from these different databases (that are not resolving conflicts for some time). Reason about all these scenarios can be challenging.

The interesting thing about this is that, if indeed there's a split-brain but clients can talk to all deployments, CouchDB could take write and return an error in case replication is currently not working, so clients could have an embedded logic to try to connect to other CouchDB deployments and do some of this replication, which will keep all the deployment in a sort of consistent state.

Some references:
http://wiki.apache.org/couchdb/Replication_and_conflicts
http://guide.couchdb.org/draft/conflicts.html
http://www.quora.com/CouchDB/How-does-CouchDB-solve-conflicts-in-a-deterministic-way

 

 

AttachmentSize
couple-fighting-on-couch.jpg38.89 KB
Published at DZone with permission of Rodrigo De Castro, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)