Agile Zone is brought to you in partnership with:

I am a programmer and architect (the kind that writes code) with a focus on testing and open source; I maintain the PHPUnit_Selenium project. I believe programming is one of the hardest and most beautiful jobs in the world. Giorgio is a DZone MVB and is not an employee of DZone and has posted 635 posts at DZone. You can read more from them at their website. View Full User Profile

Practical PHP Refactoring: Introduce Explaining Variable

06.27.2011
| 4589 views |
  • submit to reddit

The scenario of today: you have a complex expressions, longer than 80-100 characters and hard to understand. Here we can apply the classic maxim divide et impera: break down the logic in smaller, digestible chunks.

This time we don't extract a method (at least not yet), but we refactor in the small, introducing just new local variables.

I'm sure you are already thinking about cases where a method is not warranted but there is room for improvement in readability. For example, complex expressions are often found:

  • as conditions in if() and as parts of the ?: operator in PHP and other C-like languages.
  • As of aritmetic expressions without intermediate results.
  • As long concatenations of strings.

Introduce Explanatory Variable trades a very little bit of indirection for better readability.

Why should I introduce a variable?

This refactoring is very common according to Fowler, but often superceeded by Extract Method. It is helpful when you are not ready to extract a method, for several reasons:

  • an unclear or very large set of parameters.
  • Many assignments, so that a single return value won't cut it.
  • The necessity to keep the scope local to reuse intermediate results, because they are heavy to compute or we just don't want to repeat ourselves.

An advantage of methods is that they are reusable and more documenting, since they do not only have a name but also parameters with explanatory names and docblocks. But an intermediate variable is by far simpler to introduce, especially if you're already busy districating spaghetti.

Steps

  1. Insert a new line before the current complex code. Copy in there the sub-expression, and assign it to the new variable.
  2. Replace the expression you have copied with the new variable, in all the occurrences after the newly inserted line.
  3. Check that the test suite is still green.

Try to choose a good name: it's the only documentation for the variable, apart from comments (but we don't like comments, right?) Since the scope of the variable is strictly local, you are free to rename it later or to choose a long, really explanatory name.

Example

Today I've implemented an algorithm from machine learning, a Bayes classifier. It's commonly used for example to separate spam mail from ordinary mail; it is not well-known as sorting, so the difficulty in reading the code shows. It's a realistic example of code you have to read, because in most cases you do not know how it works (otherwise why read it?).

Initially this algorithm is implemented without any intermediate variable. But we are lucky, because foreach() by design already introduces temporary variables as key and value.

<?php
class Test extends PHPUnit_Framework_TestCase
{
    public function test()
    {
        $classifier = new BayesClassifier();
        $classifier->setPriors(array('spam' => 0.2, 'notspam' => 0.8));
        $classifier->addFeature('free', array(0.9, 0.1));
        $classifier->addFeature('win', array(0.99, 0.01));
        $classifier->addFeature('money', array(0.8, 0.2));
        $classifier->addFeature('social', array(0.9, 0.1));
        $classifier->addFeature('prince', array(0.9, 0.1));
        $classifier->addFeature('likelihood', array(0.01, 0.99));
        $classifier->addFeature('hitchhiker', array(0.1, 0.9));
        $result = $classifier->classify('win free money by sending $10,000 to a Nigerian prince through our social network!');
        $this->assertEquals('spam', $result);
        // forgive me for putting everything in one test for brevity
        $result = $classifier->classify('have you finished studying Bayes and maximum likelihood?');
        $this->assertEquals('notspam', $result);
    }
}

class BayesClassifier
{
    private $priors;
    private $features;

    public function setPriors($priors)
    {
        $this->priors = $priors;
    }

    public function addFeature($word, $likelihoods)
    {
        $this->features[$word] = $likelihoods;
    }

    public function classify($mailText)
    {
        $discriminators = array();
        foreach ($this->priors as $result => $prior) {
            $discriminators[$result] = $prior;
        }
        $words = explode(' ', $mailText);
        foreach ($words as $word) {
            if (isset($this->features[$word])) {
                $i = 0;
                foreach ($this->priors as $result => $prior) {
                    $discriminators[$result] = $discriminators[$result] * $this->features[$word][$i];
                    $i++;
                }
            }
        }
        return array_search(max($discriminators), $discriminators);
    }
}

In a simple steps, I introduce $likelihoods, $possibleOutcome, and some other variables: now it's a bit better to read, although it probably needs also some methods to be extracted. But this is a first step: the trade-off with the method is that the latter is more invasive and can break the code; a variable introduction is on a smaller scale, but not likely to stop the code from working.

These new variables won't explain the whole algorithm, but their naming is a starting point for the introduction of an Ubiquitous Language.

class BayesClassifier
{
    private $priors;
    private $features;

    public function setPriors($priors)
    {
        $this->priors = $priors;
    }

    public function addFeature($word, $likelihoods)
    {
        $this->features[$word] = $likelihoods;
    }

    public function classify($mailText)
    {
        $discriminators = array();
        foreach ($this->priors as $result => $prior) {
            $discriminators[$result] = $prior;
        }
        $words = explode(' ', $mailText);
        foreach ($words as $word) {
            if (isset($this->features[$word])) {
                $likelihoods = $this->features[$word];
                $possibleOutcome = 0;
                foreach ($this->priors as $result => $prior) {
                    $adjustment = $likelihoods[$possibleOutcome];
                    $discriminators[$result] = $discriminators[$result] * $adjustment;
                    $possibleOutcome++;
                }
            }
        }
        $highestLikelihood = max($discriminators);
        $mostLikelyOutcome = array_search(max($discriminators), $discriminators);
        return $mostLikelyOutcome;
    }
}
Published at DZone with permission of Giorgio Sironi, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)