I am a programmer and architect (the kind that writes code) with a focus on testing and open source; I maintain the PHPUnit_Selenium project. I believe programming is one of the hardest and most beautiful jobs in the world. Giorgio is a DZone MVB and is not an employee of DZone and has posted 636 posts at DZone. You can read more from them at their website. View Full User Profile

Configuration is code

11.19.2013
| 3709 views |
  • submit to reddit

You start out with a simple .ini file:

something.url = http://example.com/api/resource

After a while, you customize its values by deployment environment:

[development]
something.url = http://preproduction.example.com/api/resource
[production]
something.url = http://example.com/api/resource

and then substitute values in it, to remove duplication:

something.url.base = http://example.com
something.url.resource = {something.url.base}/api/resource

or substitute constants, for that matter:

something.url = "http://" . APPLICATION_ENV . "example.com"

Finally, you start supporting dynamic values, because this gives you more flexibility:

something.url.resource = {something.url.base}/api/resource{some_condition() ? '/subresource' : ''}

The thesis of this article is that an efficient solution for supporting the more complex use cases of configuration can be found, without piling up proprietary or open source libraries to parse more and more complex configuration files. This solution, namely, is to use a more powerful language: your own dynamic programming language.

<?
$something['url']['base'] = 'http://example.com';
$something['url']['resource'] "{$something['url']['base']}/api/resource" . (some_condition() ? '/subresource' : '');

Back in the day of Java first-generation frameworks

This was acceptable configuration code in a Java framework (in its web.xml file):

<web-app xmlns="http://java.sun.com/xml/ns/javaee" version="2.5">
    <servlet>
        <servlet-name>comingsoon</servlet-name>
        <servlet-class>mysite.server.ComingSoonServlet</servlet-class>
    </servlet>
    <servlet-mapping>
        <servlet-name>comingsoon</servlet-name>
        <url-pattern>/*</url-pattern>
    </servlet-mapping>
</web-app>

Let's not focus on the verbosity of XML but on the concept of route mapping. At the time, it was widely believed that separating the servlet's Java code from their mapping to URLs was a GoodDesign(TM).

Too bad servlets were always instantiated only once by the XML configuration, and coupling was easily created between the mapping and the code because of the list of parameters referenced both in the servlet's code and in the configuration file. So the dream of compiling once and configuring everywhere never worked; however programmers could have fun writing long XML files and complex IDEs to check them on the fly.

The nail in the coffin for this approach was duplication: if you have two similar zones of XML you could only stare at them and hope to remember to report changes from one to another. The alternative was to generate the XML at build time, or to invent more powerful semantics and abstraction and have the configuration reader accept them. To both options I can only say Good luck.

Today the Play framework or JAX-RS let you specify routes in annotations places directly with the servlets; annotations are first-class constructs in Java, and from there to specify routes programmatically it's a short step.

Lisp and Paul Graham: how powerful a language is

Now for something completely different. You probably know the story of Paul Graham and Viaweb, where a startup of 3 people invented web applications (accounts of the story can be exaggerated.) Graham and its cofounders used a dialect of Lisp as their programming language, which gave them a competitive advantage due to the sheer flexibility and power of Lisp itself.

Expressive power is not a fluffy concept: the experience lead Graham to define it in an essay as the capability of a language Y do to things that in a language X are only possible by writing an interpreter for the language Y. For example, autoCurry is a library for JavaScript and PHP (and probably other languages) that emulates currying, the partial application of function supported by functional languages:

(defun sum [x y] (+ x y))
(map (sum 2) list) // applied the function "sum 2" to each element of the list
$sum = autoCurry(function($x, $y) { return $x + $y; });
array_map($sum(2), $list);

How is autoCurry implemented? By writing a little Lisp interpreter in your own programming language, that uses reflection to understand the lenght of the argument list. After this information is known, it dynamically switches its behavior between full calls and partial ones, by producing a new function for the latter cases.

Greenspun's Tenth rule describes a similar scenario: every program grows until it contains a slow, bug-ridden implementation of half of Common Lisp. Here we have instead a configuration format which will grow until will contain a slow, bug-ridden implementation of half of PHP syntax.

Properties of code

Why highly complex configuration should be written in PHP (or Ruby) instead of in a declarative language? An imperative programming language has several interesting properties.

  • Robustness: when there are errors in the configuration, they are the parser ones. For example, detecting an undefined variables or a typo produces the same errors you have seen for 10 years and that have been worked on to become more readable (PAAMAYIM_NEKUDOTAYIM excluded). Checks like type hints work very well for DI configuration.
  • Diffusion: every PHP developer knows how to write configuration files in PHP, and to interpret what they're doing.
  • Support: you don't need to import any library to parse it, just some require_once() statements.
  • Existing tools. It's easy to see what is taking long to create in the configuration with a profiler, and its parsing can be greatly sped-up with Opcache or APC. These are existing tools you probably already use for your own code.
  • Duplication removal. It lets you use all the available tools to fight it: shared variables, immutable objects, anonymous functions, array generation.

On the other hand, a PHP configuration file bring about an higher cognitive load as you know you're looking to something dynamic when reading it. It also requires some disciplines as you have to force everyone to avoid network calls and other external dependencies inside it.

Conclusions

Inventing a new complex language to be able to write complex configuration in my opinion is solving the wrong problem. We already have a solution, plain old PHP code, which would solve the 80% use cases easily and makes the rest possible.

Expression languages add accidental complexity to a problem whose solution almost never is a competitive advantage for the application. Whenever a proposed solution is to add more code, we can reflect about it and see if we can reach an acceptable result without adding moving parts. Every new piece of an application is a new possible failure mode, as even open source code is free to buy but not to maintain and understand.

Published at DZone with permission of Giorgio Sironi, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Mario T. replied on Wed, 2013/11/20 - 11:24pm

This article is spot on. As development progresses, feature and configuration requirements inevitably grow. It's not just deployment variance, but also basic application-behaviour adjustments.

While INI- or XML-style configuration files at first make sense, non-programmers shun to edit them anyway (and there aren't many nice frontends anyway). Some things shouldn't be touched by end users, others should. Segregating database configurations, path definitions, and application or user settings between plain PHP source config files, ini/xml files, and/or the database is then also commonplace; a more universal approach reduces complexity. Configuration in code is often sufficient and avoids the ini-config sub language progress discussed here.

The perceived non-editability of source code configurations is not a dealbreaker in my book. For smaller projects I use mostly-editable config.php scripts. Literals can be edited from a settings UI by non-coders still without breaking their constrained syntax. But more complex expressions, resolver logic, or plain initialization code be mixed in freely. So there's one plain-code settings location that lessens the differentiation between programmers and end-users. It adds management code elsewhere, but reduces interference and overhead for the application itself, and seems the simplest thing that could possibly work.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.