I am a programmer and architect (the kind that writes code) with a focus on testing and open source; I maintain the PHPUnit_Selenium project. I believe programming is one of the hardest and most beautiful jobs in the world. Giorgio is a DZone MVB and is not an employee of DZone and has posted 636 posts at DZone. You can read more from them at their website. View Full User Profile

Cloning in PHP

05.15.2013
| 3334 views |
  • submit to reddit

Cloning is an operation consisting in the duplication of a data structure, usually to avoid the aliasing problem  of having different code modify the same instance in inconsistent ways.

In PHP, cloning can be accomplished in multiple ways - and in some cases it can be avoided altogether.

By value, by reference or by handler?

The PHP primitive data structures, such as all scalars (integer, strings, booleans) and arrays are passed by value between functions and methods. This doesn't mean they are copied each time into new memory - a copy-on-write system is in place - but conceptually you can behave like this was true. As such, you will never need to clone a scalar or an array of scalars as they will be passed outside of the current scope only with a copy, that cannot affect the original value.

There are two exceptions to this mechanism. When you need a clone for immediate *destroying computation, there is no need for special structures:

public function myMethod()
{
    $values = $this->arrayOfIntegers;
    $values[] = rand(1, 42);
    return $values; // $this->arrayOfIntegers is still the same for the next round
}

Moreover, some native functions accept parameters only by reference:

$values = $this->arrayOfIntegers;
sort($values);

And as such, these functions must be passed a copy of the original data structures if you do not want them to be modified. You can find out if a PHP function access parameters by reference in its documentation, where they will be cited with a &:

int preg_match ( string $pattern , string $subject [, array &$matches [, int $flags = 0 [, int $offset = 0 ]]] )

resource proc_open ( string $cmd , array $descriptorspec , array &$pipes [, string $cwd [, array $env [, array $other_options ]]] )

But unlike for sort() and its siblings, most of these functions overwrite completely their output arguments, so they're never passed data structures but empty arrays instead.

Objects are passed by handler in PHP 5.x: what gets copied between functions is just a pointer to the object. So if you're passing an ArrayObject or another object around, it can be modified by code that has a reference to it either as a method parameter or in a private field.

The clone keyword

PHP has a construct for performing shallow cloning:

<?php
$object = new stdClass;
$object->collaborator = new stdClass;
$clone = clone $object;
var_dump($clone === $object);
var_dump($clone->collaborator === $object->collaborator);

$ php shallow_cloning.php
bool(false)
bool(true)

clone only duplicates the object you pass to it, but not the rest of the graph it is attached to. If an object contains scalars, they are duplicated as they are copied by value; but other contained objects are not cloned too by default.

You can define a  __clone() magic method to perform cloning of internal objects. This method is called on the cloned object after it has been created with the default strategy, and should clone what is still pointing to the original instance fields:

public function __clone()
{
    $this->collaborator = clone $this->collaborator;
}

The class of $this->collaborator may have a __clone() method too, and so on up to the rest of the object graph.

This mechanism is costly but the most flexible, as it gives you complete control over what to duplicate and what to share; for example, you may decided to completely clone an object A and its collaborator B, but not the database connection that B contains as a field reference.

Serialization

If you just want to perform a complete clone of a small object graph, serialization is the fastest mechanism (from the programmer's time point of view, not from performance measurements).

<?php
$object = new stdClass;
$object->collaborator = new stdClass;
$clone = unserialize(serialize($object));
var_dump($clone === $object);
var_dump($clone->collaborator === $object->collaborator);

$ php serialization.php
bool(false)
bool(false)

Serialization transforms the object graph into a string and back into PHP objects, with the same internal private state and object links as they had in the original instances. It also deals with cycles in a small object graph, but large object graphs shouldn't be cloned completely with this method: this time we really are duplicating memory as brand new instances are called into existence.

There is also a performance overhead for serialization for each of the clone operations you perform: make sure to profile in order to validate that you're not cloning Zend Framework itself in every PHP process of your application. Besides this, some objects cannot even be serializes (e.g. PDO connections).

A radical conclusion

The most radical solution to the cloning problem is to design with immutable Value Objects:

class A
{
    private $b;
    public function __construct(B $b)
    {
        $this->b = $b;
    }
    // no other method modifies $b, which has the same structure.
}

Value Objects can be passed around without cloning due to their immutability, that makes them safe from aliasing. They produce a new object when they are asked to modify itself, so the overhead is lazy on the modification instead of on the passing. Note also that arrays of Value Objects can be passed around without cloning too, as they are copied while their contents are immutable.

Published at DZone with permission of Giorgio Sironi, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)