I am a programmer and architect (the kind that writes code) with a focus on testing and open source; I maintain the PHPUnit_Selenium project. I believe programming is one of the hardest and most beautiful jobs in the world. Giorgio is a DZone MVB and is not an employee of DZone and has posted 635 posts at DZone. You can read more from them at their website. View Full User Profile

How to correctly work with PHP serialization

08.27.2012
| 9472 views |
  • submit to reddit

PHP is able to automatically serialize most of its variables to strings - letting you save them into storage like $_SESSION. However, there are some tweaks you have to know to avoid exploding .php scripts and performance problems.

Primitives

The serialize() primitive function takes a PHP variable as its argument and returns a string from which this variable can be reconstructed:
$ php -r 'var_dump(serialize(42));'
string(5) "i:42;"
unserialize($string) performs the opposite job:
$ php -r 'var_dump(unserialize("i:42;"));'
int(42)

The serialization process is automatic, and you do not have to implement any marker interface. It works on scalars, arrays and objects alike.

However, it follows field references:

$ php -r '$object = new stdClass; $object->field = new stdClass; var_dump(serialize($object));'
string(50) "O:8:"stdClass":1:{s:5:"field";O:8:"stdClass":0:{}}"

so if you're referencing an ORM or framework objects from your own serialized ones, you're likely to grab the whole library with it. A simple way to avoid this serialization dependency is to pass the collaborator on the stack instead of injecting it into the object's field:

class MyObject
{
    public function doSomeWork(Zend_View $view) {
        // refer to $view instead of $this->view
    }
}

Forbidden types

Some variable types cannot be easily ported from one OS process to another, and as such are unsuitable for serialization:

  • variables of type resource (open files, streams, old-style connections)
  • objects that store resources inside them (a PDO instance representing a database connection)
  • closures (for some reason probably related to their references established with the use statement)

Every object that composes one of these variables encounter the same problem too: an exception thrown during serialization. Only when this happens you have to resort to custom solutions like the two that follow.

__sleep() and __wakeup()

This pair of optional magic methods can tell PHP to serialize just part of an object, resulting in an implementatio of the Memento pattern.

__sleep() should return a list of strings, which correspond to the field names representing the state of the object we want to store. From the PHP manual:

class Connection
{
    protected $connection;
    private $server, $username, $password, $db;
    
    public function __sleep()
    {
        return array('server', 'username', 'password', 'db');
    }
    
    ...

This method will be called during serialization to determine what to insert in the representation (excluding for example the closures you defined and stored on $this.)

__wakeup() is called instead of the constructor after deserialization, in order to allow the object to reestablish a link to the current process. It has no arguments, so if you want to define it you'll have to grab the collaborators from some global state (singleton, static, global variable) or recreate them by yourself.

The alternative to a __wakeup() method is a Repository object that passes in collaborators during reconstitution, before returning a valid object.

Serializable interface

This interface is an alternative to __sleep(). It allows you to specify what to serialize programmatically, by calling serialize() on a subset of the object's fields. Again, from the PHP manual:

class obj implements Serializable {
    private $data;
    public function __construct() {
        $this->data = "My private data";
    }
    public function serialize() {
        return serialize($this->data);
    }
    public function unserialize($data) {
        $this->data = unserialize($data);
    }
}

The interface is an alternative to __sleep() in the sense that you can only use one or the other method, not both.

Under the hood

The reason you have to be aware of what it takes to correctly serialize the object is that it can be done without your knowledge. For example, everything saved in $_SESSION is serialized to be transported between different processes.
This is done at the end of the .php script and outside of the normal flow of execution; if an error happens because of an unserializable field (like a closure) into your $_SESSION['state'] object, you'll see an error such as:

Fatal error: Exception thrown without a stack frame in Unknown on line 0

into your logs. This is very difficult to debug if you do not know what is kept in the session (or what's going on at all). If you have a working version of the code (like the previous commit), run a manual test in Apache and take a look at /var/lib/php5 (or your value for session.save_path) to see the existing sessions on the server and the serialized objects. This would get you a feel for what is being kept inside it and find out which object is the culprit; since all field references are followed, it can be really distant in the graph from what you're putting in $_SESSION.

Published at DZone with permission of Giorgio Sironi, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)