I am a programmer and architect (the kind that writes code) with a focus on testing and open source; I maintain the PHPUnit_Selenium project. I believe programming is one of the hardest and most beautiful jobs in the world. Giorgio is a DZone MVB and is not an employee of DZone and has posted 636 posts at DZone. You can read more from them at their website. View Full User Profile

Erlang: references

02.11.2013
| 25465 views |
  • submit to reddit

Erlang has a mechanism for automatically generate references, unique ids that can be passed between processes to identify messages that a server has generated or intercepted before in its life.

References in computer science

What Erlang calls reference appear in many fields of computer science:

  • Universally Unique IDs, UUIDs in short
  • MongoDB ObjectId()

These ids are probabilistically unique not only for a given process that generates them (progressively or randomly) but are organized in such a way to have still a strong guarantee to be unique across a distributed system. However, there is no need for coordination between different nodes or network interaction at all in order to generate them, like there would be if we inserted in a relational table and pick up its AUTO_INCREMENT value to identify records or messages.

When references are attached to messages, they can be used:

  • to identify and classify the replies to several messages: references included in replies let the process discriminate on which reply to process first, and which order to follow.
  • To discover if a node has already seen a gossip message (kind of a broadcast transmission). The node keeps track of the references he has already seen as unique identifier of messages, to discard duplicates and avoid unnecessary forwarding.
  • To identify data correlated to a message, by sending the reference in it and keeping these data in a data structure where the key is the reference. The process is not stateless in this case, but network traffic is decreases to the minimum.

Show us the code!

References are guaranteed to be unique across the world; it should be impossible to generate two identical references:

references_are_never_equal_test() ->
  ?assertNotEqual(make_ref(), make_ref()).

References can be put inside messages between processes, and that's where they are most useful as messages are always asynchronous and, when more than two processes are involved, difficult to be guaranteed in order.

In this example, a reference is passed back in a reply:

references_can_be_passed_in_requests_test() ->
  AnotherProcess = spawn(fun() -> echo() end),
  Identifier = make_ref(),
  AnotherProcess ! {self(), Identifier, "Hello"},
  receive
  {Identifier, Content} -> ?assertEqual("Hello", Content)
  end.

echo() ->
  receive {Sender, Identifier, Content} ->
  Sender ! {Identifier, Content}
end.

Let's now see a more complete example, involving more than two process.

Our test is:

references_help_to_distinguish_between_messages_test() ->
  Calculator = api_calculator(),
  Promise5 = api_square(Calculator, 5),
  Promise6 = api_square(Calculator, 6),
  receive
  {Promise5, SquareOf5} -> ?assertEqual(25, SquareOf5)
  end,
  receive
  {Promise6, SquareOf6} -> ?assertEqual(36, SquareOf6)
  end.

We create a Calculator, which is a separate process that can be on another machine. We send two requests to it in succession: calculate the square of 5 and of 6. These two requests represent expensive operations, and may take any time to be executed: we don't know which will be finished first, or even started first as the server may give different priority to them (the only guarantee we require is liveness, which means they will be processed eventually and not left to rot forever).

However, in the test, we force the order in which we want to process the responses. We do so by pattern matching the incoming messages on the references we have obtained when sending the requests: these references take the name of promises because they let us go ahead with other work but able to retrieve the reply when it arrives in the future.

The api_*() functions start the server:

api_calculator() -> spawn(fun() -> server() end).

and send messages to it containing a generated reference:

api_square(Calculator, Number) ->
  Reference = make_ref(),
  Calculator ! {self(), Reference, Number},
  Reference.

The server receives the message, spawns a new process, and continues.

server() ->
  receive
  {Sender, Reference, Base} -> spawn(fun() -> calculate_square(Sender, Reference, Base) end)
  end,
  server().

calculate_square(Sender, Reference, Base) ->
  Sender ! {Reference, Base*Base}.

And that's it: every client is able to uniquely identify its own requests, no matter how many nodes or requests are involved concurrently.

Published at DZone with permission of Giorgio Sironi, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)