HTML5 Zone is brought to you in partnership with:

Creator of the Apache Tapestry web application framework and the Apache HiveMind dependency injection container. Howard has been an active member of the Java community since 1997. He specializes in all things Tapestry, including on-site Tapestry training and mentoring, but has lately been spreading out into fun new areas including functional programming (with Clojure), and NodeJS. Howard is a DZone MVB and is not an employee of DZone and has posted 80 posts at DZone. You can read more from them at their website. View Full User Profile

Node and Callbacks

03.12.2012
| 6182 views |
  • submit to reddit

One of the fears people have with Node is the callback model. Node operates as a single thread: you must never do any work, especially any I/O, that blocks, because with only a single thread of execution, any block will block the entire process.

Instead, everything is organized around callbacks: you ask an API to do some work, and it invokes a callback function you provide when the work completes, at some later time. There are some significant tradeoffs here ... on the one hand, the traditional Java Servlet API approach involves multiple threads and mutable state in those threads, and often those threads are in a blocked state while I/O (typically, communicating with a database) is in progress. However, multiple threads and mutable data means locks, deadlocks, and all the other unwanted complexity that comes with it.

By contrast, Node is a single thread, and as long as you play by the rules, all the complexity of dealing with mutable data goes away. You don't, for example, save data to your database, wait for it to complete, then return a status message over the wire: you save data to your database, passing a callback. Some time later, when the data is actually saved, your callback is invoked, and which point you can return your status message. It's certainly a trade-off: some of the local code is more complicated and bit harder to grasp, but the overall architecture can be lightening fast, stable, and scalable ... as long as everyone plays by the rules.

Still the callback approach makes people nervous, because deeply nested callbacks can be hard to follow. I've seen this when teaching Ajax as part of my Tapestry Workshop.

I'm just getting started with Node, but I'm building an application that is very client-centered; the Node server mostly exposes a stateless, restful API. In that model, the Node server doesn't do anything too complicated that requires nested callbacks, and that's nice. You basically figure out a single operation based on the URL and query parameters, execute some logic, and have the callback send a response.

There's still a few places where you might need an extra level of callbacks. For example, I have a (temporary) API for creating a bunch of test data, at the URL /api/create-test-data. I want to create 100 new Quiz objects in the database, then once they are all created, return a list of all the Quiz objects in the database. Here's the code:

var Quiz, schema, sendJSON;

schema = require("../schema");

Quiz = schema.Quiz;

sendJSON = function(res, json) {
  res.contentType("text/json");
  return res.send(JSON.stringify(json));
};

module.exports = function(app) {
  app.get("/api/quizzes", function(req, res) {
    return Quiz.find({}, function(err, docs) {
      if (err) throw err;
      return sendJSON(res, docs);
    });
  });
  app["delete"]("/api/quizzes/:id", function(req, res) {
    console.log("Deleting quiz " + req.params.id);
    return Quiz.remove({
      _id: req.params.id
    }, function(err) {
      if (err) throw err;
      return sendJSON(res, {
        result: "ok"
      });
    });
  });
  return app.get("/api/create-test-data", function(req, res) {
    var i, keepCount, remaining, _results;
    remaining = 100;
    keepCount = function(err) {
      if (err) throw err;
      remaining--;
      if (remaining === 0) {
        return Quiz.find({}, function(err, docs) {
          if (err) throw err;
          return sendJSON(res, docs);
        });
      }
    };
    _results = [];
    for (i = 1; 1 <= remaining ? i <= remaining : i >= remaining; 1 <= remaining ? i++ : i--) {
      _results.push(new Quiz({
        title: "Test Quiz \# " + i
      }).save(keepCount));
    }
    return _results;
  });
};

It should be pretty easy to pick out the logic for creating test data at the end. This is normal Node JavaScript but if it looks a little odd, it's because it's actually decompiled CoffeeScript. For me, the first rule of coding Node is always code in CoffeeScript! In its original form, the nesting of the callbacks is a bit more palatable:

# Exports a single function that is passed the application object, to configure
# its routes

schema = require "../schema"
Quiz = schema.Quiz

sendJSON = (res, json) ->
  res.contentType "text/json"
  # TODO: It would be cool to prettify this in development mode
  res.send JSON.stringify(json)

module.exports = (app) ->

  app.get "/api/quizzes",
    (req, res) ->
      Quiz.find {}, (err, docs) ->
        throw err if err
        sendJSON res, docs

  app.delete "/api/quizzes/:id",
    (req, res) ->
      console.log "Deleting quiz #{req.params.id}"
      # very dangerous! Need to add some permissions checking
      Quiz.remove { _id: req.params.id }, (err) ->
        throw err if err
        sendJSON res, { result: "ok" }

  app.get "/api/create-test-data",
    (req, res) ->
      remaining = 100

      keepCount = (err) ->
        throw err if err
        remaining--

        if (remaining == 0)
          Quiz.find {}, (err, docs) ->
            throw err if err
            sendJSON res, docs

      for i in [1..remaining]
        new Quiz(title: "Test Quiz \# #{i}").save keepCount

What you have there is a count, remaining, and a single callback that is invoked for each Quiz object that is saved. When that count hits zero (we only expect each callback to be invoked once), it is safe to query the database and, in the callback from that query, send a final response. Notice the slightly odd structure, where we tend to define the final step (doing the final query and sending the response) first, then layer on top of that the code that does the work of adding Quiz objects, with the callback that figures out when all the objects have been created.

The CoffeeScript makes this a bit easier to follow, but between the ordering of the code, and the three levels of callbacks, it is far from perfect, so I thought I'd come up with a simple solution for managing things more sensibly. Note that I'm 100% certain that this issue has been tackled by any number of developers previously ... I'm using the excuse of getting comfortable with Node and CoffeeScript as an excuse to embrace some Not Invented Here syndrome. Here's my first pass:

event = require "events"
_ = require "underscore"

# Helps to organize callbacks.  At this time, it breaks normal
# conventions and makes not attempt to catch errors or fire an 'error'
# event.
class Flow extends event.EventEmitter

  constructor: ->
    @count = 0
    # Array of zero-arg functions that invoke join callbacks
    @joins = []

  invokeJoins: ->
      # The join callbacks may add further callbacks or further join
      # callbacks, but that only affects future completions.
      joins = @joins
      @joins = []
      join.call(null) for join in joins
      @emit 'join', this

  checkForJoin: ->
    @invokeJoins() if --@count == 0

  # Adds a callaback and returns a function that will invoke the
  # callback. Adding a callback increases the count. The count is
  # decreased after the callback is invoked.  Callbacks are invoked
  # with this set to null.  Join callbacks are invoked when the count
  # reaches zero. Callbacks should be added before join callbacks are
  # added.
  add: (callback) ->

    # One more callback until we can invoke join callbacks
    @count++

    (args...) =>
      callback.apply null, args...

      @checkForJoin()

  # Adds a join callback, which will be invoked after all previously
  # added callbacks have been invoked. Join callbacks are invoked with
  # this set to null and no arguments. Emits a 'join' event, passing
  # this Flow, after invoking any explicitly added join callbacks.
  # Invokes the callback immediately if there are no outstanding
  # callbacks.
  join: (callback) ->

    @joins.push callback

    @invokeJoins() if @count == 0

  # TODO:
  # sub flows (for executing related tasks in parallel)

module.exports = Flow

The Flow object is a kind of factory for callback wrappers; you pass it a callback and it returns a new callback that you can pass to the other APIs. Once all callbacks that have been added have been invoked, the join callbacks are invoked after each of the other callbacks have been invoked. In other words, the callbacks are invoked in parallel (well, at least, in no particular order), and the join callback is invoked only after all the other callbacks have been invoked.

In practice, this simplifies the code quite a bit:

  app.get "/api/create-test-data",
    (req, res) ->

      flow = new Flow
      for i in [1..100]
        quiz = new Quiz
          title: "Test Quiz \# #{i}"
          location: "Undisclosed"

        quiz.save flow.add (err) ->
          throw err if err

      flow.join ->
        Quiz.find {}, (err, docs) ->
          throw err if err
          sendJSON res, docs

So instead of quiz.save (err) -> ... it becomes quiz.save flow.add (err) -> ..., or in straight JavaScript: quiz.save(flow.add(function(err) { ... })).

So things are fun; I'm actually enjoying Node and CoffeeScript at least as much as I enjoy Clojure; which is nice because it's been years (if ever) since I've enjoyed the actual coding in Java (though I've liked the results of my coding, of course).

 

Published at DZone with permission of Howard Lewis Ship, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)