Agile Zone is brought to you in partnership with:

I am a programmer and architect (the kind that writes code) with a focus on testing and open source; I maintain the PHPUnit_Selenium project. I believe programming is one of the hardest and most beautiful jobs in the world. Giorgio is a DZone MVB and is not an employee of DZone and has posted 637 posts at DZone. You can read more from them at their website. View Full User Profile

Sitting on the couch

12.21.2010
| 5957 views |
  • submit to reddit

CouchDB is one of the most famous open source document-oriented databases available on the web.

This article described my experience with CouchDB during an university project, after having worked only with relational databases for years. I never had time to try NoSQL at work, since we are strictly coupled on the relational database and to an ORM on top of that.

This experience has been a spike: it is not test-driven, and will be thrown away once the project is finished. It is incremental, tough, since this is an effective way to learn a new technology.

What is different in CouchDB

No schema, obviously

I found difficult to create fixture data manually without a reference schema, an automated script would probably be better. The flexibility of CouchDB is of course, that you don't have to follow a schema like an SQL table definition, and can add fields/columns to single documents at will.

Easy Linux installation

Like you would with a LAMP stack, with a single command you can get a running CouchDB instance on your Linux box. For example, ob Ubuntu

sudo apt-get install couchdb

will get you a running server on port 5984. Ubuntu already uses CouchDB in production to synchronize Ubuntu One stuff.

Keys are not sequential anymore, but UUID or manually specified.

These identifiers are thought for easy distribution of data across multiple noes, but I have not explored that in this primer. A sequence of natural numbers for IDs is only handy in a single-node database.

Eventual consistency

Your views will be consistent with your state sometime (at an unstated time point in the future). This is definitely dangerous for most of enterprise databases, but if you're managing a social network timeline, however...

Values are not atomic

While in a relational table's row a value for a column defines a strict type, you can store what you want as a property of a document. For example, arrays or other objects. In fact, CouchDB thinks of a document as a JSON value, as it is built for the web.

CouchDB talks HTTP

The plethora of drivers that is available for CouchDB reflects the fact that the protocol is nearly universal, and not binary like the one of common databases such as MySQL, Sql Server and Oracle. And of course, thanks to this design choice CouchDB it can be spoken to from JavaScript code.

MapReduce

Views are loosely the CouchDB equivalent of SQL queries. However, you don't write them with a declarative language like SQL, but with a couple of map/reduce functions that follows in fact the Map/Reduce paradigm.

Now why whould you want to do that low-level work? First, you'll have a lot more flexibility in filtering data, and in building the data structure you choose to return. The flexibility - which can be dangerous in the wrong hands - is thus reflected both at the schema and the query level.

More importantly, map-reduce is how far you can get with a general paradigm for querying which still is intrinsically parallelizable. Map functions can be executed on different nodes, and reducers can be placed on different nodes too. When a field change, the number of operations performed to update the views on N documents is O(log N), since the modified document is the only one to be mapped again, while the other intermediate (and unchanged) results are already stored.

This lower level of abstraction is beneficial to performance, but it implies however that you have to think about queries before starting to populate your database, because some queries may be impossible or perform very poorly with a particular data model. The latter is often true also for SQL-based queries however.

CouchApp

CouchApp is a simple Python command line tool to manage the development of a web application that uses CouchDB. Writing views directly in Futon, the phpMyAdmin of CouchDB which runs out of the box, is a terrible task since you'll have to embed JavaScript code into strings.

With CouchApp instead you can generate the structure of your folders, writing functions each in its own .js file with syntax checking, and then push your app to a live CouchDB instance.

This tool puts a simple AJAX app (HTML, JavaScript and CSS code) into CouchDB itself, serving also the source code along with the data.

I coded with the aid of a local git repository and then pushed my examples to the localhost server with couchapp push $databaseName (the term push is coincidentally the same as for Git, but it means deploy in CouchApp.) I'll definitely advise you to use CouchApp if you want to try out CouchDB without other hassles.

My conclusions

I like how CouchDB, and the NoSQL movement in general, change assumptions about what we need from a database. We have heavy, strictly consistent relational databases, as well as simple, fast and eventually consistent ones like CouchDB. Finding a match for our use cases is now simpler, more than it would ever do with a thousand other open source relational databases.

Published at DZone with permission of Giorgio Sironi, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Lawrence Maccherone replied on Tue, 2010/12/21 - 2:54pm

We're also playing with CouchDB in a business project and I wanted to add a little nuance to two of your points above:

  1. Atomicity. Atomicity as defined by ACID properties has to do with the fact that transactions are all or nothing. It's about making sure that changes that you make are in agreement with each other. For instance if you add a row to an invoice, you also want to make sure that a de-normalized summary of the total charges for all items on the invoice agree. If the change to the sum is not allowed to complete due to a lock conflict, than the row shouldn't be added either. In CouchDB, the example I just gave would be "Atomic" if the rows were stored in the same document as the sum value. I know that's not what you mean by "Values are not atomic" but I just wanted to make sure that was clear to folks who didn't know anything about CouchDB.
  2. Eventual Consistency. One of the things that we love about CouchDB is that it is easy to reason about how an application will behave under its consistency model. There is always local consistency. If you need distributed consistency, you can achieve it, but in general, by understanding the consistency model (incremental replication), you can design it so that it's there when you need it and you know when it can't be relied upon.

Giorgio Sironi replied on Wed, 2010/12/22 - 4:01am in response to: Lawrence Maccherone

For 1. I meant in fact that values of document fields are not thought as scalars like for a relational database columns (in 1NF), but also as arrays of other objects. :)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.