HTML5 Zone is brought to you in partnership with:

Dr. Axel Rauschmayer is a freelance software engineer, blogger and educator, located in Munich, Germany. Axel is a DZone MVB and is not an employee of DZone and has posted 246 posts at DZone. You can read more from them at their website. View Full User Profile

Quasi-literals: embedded DSLs in ECMAScript.next

09.21.2011
| 2484 views |
  • submit to reddit

 Quasi-literals [1] are a syntactic construct that facilitates the implementation of embedded domain-specific languages (DSLs) in JavaScript. They are currently slated for inclusion in the next version of ECMAScript [2]. This post explains how quasi-literals work.

Introduction

The idea is as follows: A quasi-literal (short: a quasi) is similar to a string literal and a regular expression literal in that it provides a simple syntax for creating data. The following is an example.
    quasiHandler`Hello ${firstName} ${lastName}`
This is just a compact way of writing (roughly) the following function call:
    quasiHandler("Hello ", firstName, " ", lastName)
Thus, the name before the content in backquotes is the name of a function to call, the quasi handler. The handler receives two different kinds of data:
  • Literal sections such as "Hello ".
  • Substitutions such as firstName (delimited by a dollar sign and braces). A substitution can be any expression. If the substitution is simply an identifier, you can omit the delimiting braces:
        quasiHandler`Hello $firstName $lastName`
    
Literal sections are known statically, substitutions are only known at runtime. As many handlers need to distinguish between these two kinds of data (see examples below), the actual handler invocation is slightly more complex than shown above and allows the handler to make that distinction.

Examples

Quasis are quite versatile, because a quasi-literal becomes a function call and because the text that that function receives is structured. Therefore, you only need to write a new function to implement a new domain-specific language. The following examples are taken from [1] (which you can consult for details):
  • Raw strings: are string literals with multiple lines of text and no interpretation of escaped characters.
        var str = raw`This is a text
        with multiple lines.
        Escapes are not interpreted,
        \n is not a newline.`;
    
  • Parameterized regular expression literals: There are two ways of creating regular expression instances.
    • Statically, via a regular expression literal.
    • Dynamically, via the RegExp constructor.
    If you use the latter way, it is because you have to wait until runtime so that all necessary ingredients are available: You are usually concatenating regular expression fragments and text that is to be matched verbatim. The latter has to be escaped properly (dots, square brackets, etc.). By defining a regular expression handler re, we can help with this task:
        re`\d+(${localeSpecificDecimalPoint}\d+)?`
    
  • Query languages. Example:
        $`a.${className}[href=~'//${domain}/']`
    
    This is a DOM query that looks for all <a> tags whose CSS class is className and whose target is a URL with the given domain. The quasi handler $ ensures that the arguments are correctly escaped, making this approach safer than manual string concatenation.
  • Text localization (L10N): There are two components to L10N. First the language and second the locale (how to format numbers, time, etc.). Given the following message.
        alert(msg`Welcome to ${siteName}, you are visitor number ${visitorNumber}:d!`);
    
    The handler msg would work as follows.
    • It creates the following skeleton to look up a translation in a table.
          Welcome to {0}, you are visitor number {1}
      
      The translation might be:
          Besucher Nr. {1}, willkommen bei {0}!
      
    • Next, substitution meta-data such as :d is extracted from the literal parts and used to format the data that is to be filled in. In the example, :d indicates that a locale-specific decimal separator should be used for substitution {1}. Thus, a possible English result is:
          Welcome to ACME Corp., you are visitor number 1,300!
      
      In German, we have results such as:
          Besucher Nr. 1.300, willkommen bei ACME Corp.!
      
  • Secure content generation: With quasis, one can make a distinction between trusted content coming from the program and untrusted content coming from a user. For example:
        safehtml`<a href="${url}">${text}</a>`
    
    The literal sections come from the program, the substitutions url and text come from a user. The quasi handler safehtml can ensure that no malicious cade is injected via the substitutions. For HTML, the ability to nest quasis is useful:
        rows = [['Unicorns', 'Sunbeams', 'Puppies'], ['<3', '<3', '<3']],
        safehtml`<table>${
            rows.map(function(row) {
                return safehtml`<tr>${
                    row.map(function(cell) {
                        return safehtml`<td>${cell}</td>`
                    })
                }</tr>`
            })
        }</table>`
    
    Explanation: The rows of the table are produced by an expression – the invocation of the method row.map(). The result of that invocation is an array of strings that are produced by recursively invoking a quasi. safehtml concatenates those strings and inserts them into the given frame. The cells for each row are produced in the same manner.
  • Templates: Templates are very similar to quasis, in that they are text with holes in them. But one normally uses objects (e.g. JSON data) to fill in the holes. For example, the following is a template:
        <h1>${{title}}</h1>
        ${{content}}
    
    Using a quasi instead of a string literal to define this text has two advantages: Quasis do the parsing for you and a quasi can comprise multiple lines. A template would be defined as follows:
        var myTmpl = tmpl`
        <h1>${{title}}</h1>
        ${{content}}
        `;
    
    This works, because {title} and {content} are actual ECMAScript.next expressions: {foo,bar} is syntactic sugar for {foo: foo, bar: bar}. Thus, the handler will receive a value such as { title: undefined } for the first substitution. With templates, the handler is not interested in the value of title, just in its name and this trick lets it access it. A disadvantage of using a quasi in this manner is that variables such as title and content have to exist (but they don’t have to have a value). Therefore, the above must be written as
        var title, content;
        var myTmpl = tmpl`
        ...
    

Implementing a handler

The following is a quasi-literal:
    handlerName`lit1\n${subst1} lit2 ${subst2}`
This is transformed internally to a function call (adapted from [1]):
    // hoisted declaration.
    const callSiteId1234 = {
        raw: [ 'lit1\\n', ' lit2 ', '' ], // newline as written
        cooked: [ 'lit1\n', ' lit2 ', '' ], // newline interpreted
    };

    // in-situ
    handlerName(callSiteId1234, subst1, subst2)
The parameters of the handler are split into two categories:
  1. The callSiteID where you get the literal parts both with escapes such as \n interpreted (“cooked”) and uninterpreted (“raw”). The number of literal parts is always one plus the number of substitutions. If a substitution is last in a literal, then an empty literal part is created (as in the example above).
  2. The substitutions, whose values become trailing parameters.
The idea is that the same literal might be executed multiple times (e.g. in a loop); with the callSiteID, the handler can cache data from previous invocations. (1) is potentially cacheable data, (2) changes with each invocation.

Assigning to substitutions. An extended version of quasis (that probably won’t be part of ECMAScript.next) allows one to assign to substitutions. For example:

    if (re_match`before (${=x}\d+) after`(myString)) {
        // Do something with x
    }
re_match creates a function which is immediately invoked on myString. That function returns true if myString is a match and assigns the first matching group to the variable x at the same time. Compare the above to the equivalent quasi-less JavaScript code below. Note that you need an extra variable to hold the match.
    var match = /before (\d+) after/.exec(myString);
    if (match) {
        x = match[1];
        // Do something with x
    }
To make a substitution assignable, the follow translation happens: Each writable substitution ${=x} is passed to the handler as the following function (other writable substitutions such as ${=obj.prop} work the same).
    function () { return arguments.length ? (x = arguments[0]) : x }
Explanation: If you call this function with no arguments, you get the value of the substitution. If you provide an argument, it is assigned to the substitution.

Each read-only substitution ${x} is passed to the handler as a function.

    function() { return x }

Conclusion

As you can see, there are many applications for quasi-literals. You might wonder why ECMAScript.next does not introduce a full-blown macro system. That is because it is quite difficult to create a macro system for a language whose syntax is as complex as JavaScript’s. This task will thus take more time and, possibly, research. There is hope, though: With much luck, we will see macros in ECMAScript 8 [3].

Acknowledgement. Thanks to Brendan Eich, Mark S. Miller, Mike Samuel, and Allen Wirfs-Brock for answering my quasis-related questions on the es-discuss mailing list.

References

  1. ECMAScript Quasi-Literals [proposal for ECMAScript.next]
  2. ECMAScript.next: the “TXJS” update by Eich
  3. A first look at what might be in ECMAScript 7 and 8

From http://www.2ality.com/2011/09/quasi-literals.html

Published at DZone with permission of Axel Rauschmayer, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)