NoSQL Zone is brought to you in partnership with:

Mitch Pronschinske is a Senior Content Analyst at DZone. That means he writes and searches for the finest developer content in the land so that you don't have to. He often eats peanut butter and bananas, likes to make his own ringtones, enjoys card and board games, and is married to an underwear model. Mitch is a DZone Zone Leader and has posted 2573 posts at DZone. You can read more from them at their website. View Full User Profile

Cassandra Adds Hadoop MapReduce

04.13.2010
| 35254 views |
  • submit to reddit
Today the Cassandra project announced its first new release since becoming a Top-Level Project at Apache.  Don't let the low version number fool you.  Cassandra 0.6 is one of the most mature NoSQL distributed data stores in the open source market.  It was heavily developed by Facebook before it was open sourced in August 2008.  Currently Cassandra is being used by four of the largest social media sites in the world: Facebook, Digg, Reddit, and Twitter

One of the primary new features in Cassandra 0.6 is support for Apache Hadoop.  This is a major upgrade for Cassandra, giving it even more "big data" capabilities.  The new feature will allow Cassandra to run analytics against its own data using Hadoop's reliable MapReduce framework.

                      

Cassandra 0.6 simplifies its architecture with a new integrated caching row.  With the implementation of this new feature, Cassandra no longer needs a separate caching layer.  Along with the simplified architecture, Cassandra 0.6 also features a performance boost.  The distributed data store can already process thousands of writes per second, and this version's enhancements builds on that number.

"Apache Cassandra 0.6 is 30% faster across the board, building on our already-impressive speed," said Jonathan Ellis, Apache Cassandra Project Management Committee Chair in the press release.  "It achieves scale-out without making the kind of design compromises that result in operations teams getting paged at 2 AM."  The Storage Team Technical Lead at Twitter, Ryan King, explained Twitter's reasons for using Cassandra: "At Twitter, we're deploying Cassandra to tackle scalability, flexibility and operability issues in a way that's more highly available and cost effective than our current systems."

One of Cassandra's best known features is its lack of any single point of failure.  The data store's distributed system smoothly replaces any node that goes down with a new node.  The system also has the flexibility to be tuned for more consistency or more availability.

The previous version of Cassandra (0.5) added load balancing and significantly improved bootstrap and concurrency.  New tools were also added, including JSON-based data import and export, new JMX metrics, and an improved command line interface.  “It's fantastic seeing the Project's community at the ASF grow to match the promise of the technology," said Ellis.

You can download Cassandra 0.6 now on the project's website.  For more info on Cassandra, check out "4 Months with Cassandra, a love story."

Comments

Endre Varga replied on Tue, 2010/04/13 - 10:44am

Wow, this is cool! Lack of MapReduce was the only missing feature for me.

Kate Lewis replied on Thu, 2010/04/22 - 2:22am

Very informative post indeed! I am interested in knowing more on Mapreduce and Large Data analytics.. This one resource looks great... High Performance Analytics with Hadoop http://www.impetus.com/featured_webinar?eventid=16

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.