I am a programmer and architect (the kind that writes code) with a focus on testing and open source; I maintain the PHPUnit_Selenium project. I believe programming is one of the hardest and most beautiful jobs in the world. Giorgio is a DZone MVB and is not an employee of DZone and has posted 636 posts at DZone. You can read more from them at their website. View Full User Profile

Erlang: binaries and bitstrings

01.30.2013
| 10333 views |
  • submit to reddit

Erlang has language support for manipulating efficiently sequences of bytes and bits, offering a low-level interface for accessing these streams while retaining the comfort of a language which runs on a virtual machine.

So I wrote some exploratory tests while I was studying and coding from the Erlang Programming book, definitely a recommended source as it extracts from Erlang all the bits (pun) that you can't find in other languages, while going quickly over how to write a for cycle (actually a list comprehension or a map operation).

Binaries

Binaries represent ordered sequences of bytes in Erlang. It's easy to pack Erlang data structures into binaries for sending them over the wire, and I tried to do that with an atom:

binary_representation_test() ->
  Bin = term_to_binary(a),
  ?assertEqual(<<131, 100, 0, 1, 97>>, Bin).

That was a sequence of 40 bits, organized in 5 bytes. The syntax <<>> is customary for expressing a sequence of bytes or bits directly, and if you don't say anything each comma-separated value will be considered a byte long.

You can convert them back to Erlang data structures with just the API you would expect:

binary_conversions_test() ->
  Bin = term_to_binary(a),
  ?assertEqual(a, binary_to_term(Bin)).

I don't know the format that Erlang is using here, and it goes without saying that you shouldn't rely on this implementation detail but only use primitives for conversion.

We are not limited to atoms of course: I've tested this serialization mechanism with a more complex, nested data structure.

binary_complex_conversions_test() ->
  Bin = term_to_binary({a, 42, {b, c}}),
  ?assertEqual({a, 42, {b, c}}, binary_to_term(Bin)).

Let's build an RGB triplet, with a byte representing each color component to get a 24-bit channel:

bit_building_test() ->
  G=6*16+6,
  B=9*16+9,
  Bin = <<0, G, B>>,
  ?assertEqual([0, G, B], binary_to_list(Bin)).

In this case it's not useful to cut this binary into segments smaller than single bytes. This is true for most of high-level programming.

So we can even pattern match over the bytes contained in the binary, as long as they are a predetermined quantity:

bit_pattern_matching_test() ->
  G=6*16+6,
  B=9*16+9,
  Bin = <<0, G, B>>,
  <<0, G, Blue>> = Bin,
  ?assertEqual(B, Blue).

Bitstrings

While binaries is the Erlang term that denotes sequences of bytes, bits are organized in bitstrings (which are actually the same structure as far as I can manipulate them).

When you work at the IP, UDP or TCP level, efficiency is highly valued and it's possible to extract single bits (or N-bit sequences) directly from bitstrings. In another language, like C or Java, you would have to code an abstraction over bytes yourself to work at a bit level (using bitmasks).

To show you how you can work a bit level, let's build a character (8 bit from the combination of two different 4-bit sequences. A is 0x65 in the ASCII code, so viewed in two 4-bit sequences is 0100 0001:

bin_types_test() ->
  ?assertEqual(<<"A">>, <<4:4, 1:4>>).

With the :4 suffix we are telling Erlang to represent the expression as a bitstring whose length is 4.

Let's do something slightly more complex: pattern matching over these structures.

bin_pattern_matching_with_types_test() ->
  Bin = <<"Answer", 42, "ok">>,
  <<"Answer", Int, Result/binary>> = Bin,
  ?assertEqual(42, Int),
  ?assertEqual(<<"ok">>, Result).

The first pattern we match is the fixed value "Answer"; no problem with that.

The second pattern is a variable, which we name Int; by default, new variables will match 8-bit long sequences, which is the most common type (integer values).

The third pattern we are using just extract the rest of Bin as a binary. Since packets and segments usually have an header with a fixed length and a variable payload, it is customary to extract it with this expression.

Conclusions

Erlang really has batteries included for its use case: low latency communication gateways and distributed systems. The language itself mixes higher-level functions such as map and filter with the possibility to segment streams at the low level.

In other languages you would have to write a parser yourself, while Erlang allows to write the content of a binary message in a declarative way that can then be used to extract the interesting pieces. By doing this in the language instead of with additional libraries, Erlang also opens up the door for optimization.

Check out all the code used in these examples from GitHub.

Published at DZone with permission of Giorgio Sironi, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)