I am a programmer and architect (the kind that writes code) with a focus on testing and open source; I maintain the PHPUnit_Selenium project. I believe programming is one of the hardest and most beautiful jobs in the world. Giorgio is a DZone MVB and is not an employee of DZone and has posted 636 posts at DZone. You can read more from them at their website. View Full User Profile

Practical PHP Patterns: Lazy Loading

05.26.2010
| 20018 views |
  • submit to reddit

Lazy initialization is a common procedure in programming, especially in an environment like PHP applications where during a specific HTTP request it's practically sure that there are resources which won't be used at all, and that, if eager loaded, would be simply discarded without being referenced again.

The Lazy Loading pattern for stateful objects is a transparent solution for loading entity objects on demand, when they are referenced troughout client code. Its typical usage is in the internals of an ORM or in another kind of Data Mapper. Lazy Loading is not a mapper-specific pattern, but in the environment of enterprise patterns this is the best example of its application.

Intent

In general, the object graph managed by a Data Mapper can be large at will, and loading the entire graph to satisfy a request which usually involves only a small subset of it is a waste. This is the case when using many-to-many bidirectional relationships, as it is the case with the Unix users and groups model. For example, loading an User will bring back also his referenced groups, which in turn will reference their contained users and so on... In general, it is not possible to always break the relationships in one direction to simplify the model. In this case, full loading is not a waste: it's not feasible unless you have gigabytes of ram to allocate per every request.

Another use case for lazy loading of domain model entities is in arbitrary computation to perform when there is no knowledge a priori of the extent of the object graph that will be reached by the method calls (or field access). For example, we may want to do a statistical analysis of the userw reachable from a particular starting point through groups relationships, and stop the research when we have reached a predetermined number of users. In this kind of computation, we cannot know for sure how much JOINs we will have to perform as this is an example of a query which falls out of the scope of SQL.

Implementation

Lazy Loading is generally implemented with a Proxy pattern, where a fake entity is substituted to the real objects at the boundary of the object graph. This fake entities do not load relationships and fields when they are first inserted in the graph.
The Proxy pattern implementation used here is the ghost variant, where the object is essentially a proxy for itself, since the field are loaded and stored in it on the first access. The ghost has only its identifier fields loaded by default, and it is inserted in the Identity Map anyway.

The PHP implementation of Lazy Loading mechanism in ORMs has to respect some caveats to work.
For starters, the Proxy object substituted for the real one has to conform to the same interface of the original class. This is accomplished via subclassing since there is usually no explicit interface (defined with the interface keyword) for domain objects. This subclass may be generated previous to its usage or at runtime, when it comes the time to insert a proxy in the graph.
The methods of the entity, such as getters, setters, or logic-related ones are overridden, and their new implementations call a load() method before delegating to the parent method.

The load() private method uses a reference to the Entity Manager or to some internal component of the Orm to load the fields of the object itself, and its relationships. Note that this referenced objects can be proxies in turn. Of course, the load() method has a boolean guard that makes it perform real execution of queries only one time, since if the object has already been loaded its behavior returns to be the original class's one.

Lazy Loading can be used also for public properties via __get() and __set() overriding, but it is maybe too much transparent as accessing a public property may cause a query to the database. PHP magic methods are often abused.

Performance

Lazy Loading may cause performance problems when used extensively. The main issues is that it promotes many small queries to load objects one at the time, instead of a big JOIN which can load the whole object graph subset needed, avoiding the overhead of all the unnecessary queries and a chatty communication. Even if the database is high-performance or not relational or whatever, if it resides on another machine communication over the network will experience longer latency and it is best performed in batches.

Parts of the graph to eager load via joins (instead of lazy load) can be specified while querying via the language of choice (HQL, JQL, Linq, DQL) or via programmatic interface (Criteria object). This part of the relational information crosses the boundary of the ORM to reach its client code, but it is mandatory to optimize the generated queries to an acceptable level. One solution is to encapsulate it in Repository implementations, that provide a domain-specific interface and hide all the queries in the object-related language of the ORM.

Lazy Loading is complex, and it is perfect for outsourcing to a library like an Orm. Be aware of the issues it causes on performance and you will be able to take advantage of it without causing strain on your database.

Example

The code sample for this ORM pattern is taken from Doctrine 2, where I contributed part of the code related to lazy loading one-to-one and many-to-one relationships. The class presented here is the ProxyFactory, which generates a proxy object for a particular entity and, if needed, the source code for the proxy class. I have inserted some extension to the docblock comments.

<?php
namespace Doctrine\ORM\Proxy;

use Doctrine\ORM\EntityManager,
Doctrine\ORM\Mapping\ClassMetadata,
Doctrine\ORM\Mapping\AssociationMapping;

/**
* This factory is used to create proxy objects for entities at runtime.
*
* @author Roman Borschel <roman@code-factory.org>
* @author Giorgio Sironi <piccoloprincipeazzurro@gmail.com>
* @since 2.0
*/
class ProxyFactory
{
/** The EntityManager this factory is bound to. */
private $_em;
/** Whether to automatically (re)generate proxy classes. */
private $_autoGenerate;
/** The namespace that contains all proxy classes. */
private $_proxyNamespace;
/** The directory that contains all proxy classes. */
private $_proxyDir;

/**
* Initializes a new instance of the <tt>ProxyFactory</tt> class that is
* connected to the given <tt>EntityManager</tt>.
*
* @param EntityManager $em The EntityManager the new factory works for.
* @param string $proxyDir The directory to use for the proxy classes. It must exist.
* @param string $proxyNs The namespace to use for the proxy classes.
* @param boolean $autoGenerate Whether to automatically generate proxy classes.
*/
public function __construct(EntityManager $em, $proxyDir, $proxyNs, $autoGenerate = false)
{
if ( ! $proxyDir) {
throw ProxyException::proxyDirectoryRequired();
}
if ( ! $proxyNs) {
throw ProxyException::proxyNamespaceRequired();
}
$this->_em = $em;
$this->_proxyDir = $proxyDir;
$this->_autoGenerate = $autoGenerate;
$this->_proxyNamespace = $proxyNs;
}

/**
* Gets a reference proxy instance for the entity of the given type and identified by
* the given identifier.
* Generalle this method will reuse the source code it has already generated, even in
* other HTTP requests since the source file are saved in a configurable folder.
* This is the only public method the other parts of the ORM will generally use.
*
* @param string $className
* @param mixed $identifier
* @return object
*/
public function getProxy($className, $identifier)
{
$proxyClassName = str_replace('\\', '', $className) . 'Proxy';
$fqn = $this->_proxyNamespace . '\\' . $proxyClassName;

if ($this->_autoGenerate && ! class_exists($fqn, false)) {
$fileName = $this->_proxyDir . DIRECTORY_SEPARATOR . $proxyClassName . '.php';
$this->_generateProxyClass($this->_em->getClassMetadata($className), $proxyClassName, $fileName, self::$_proxyClassTemplate);
require $fileName;
}

if ( ! $this->_em->getMetadataFactory()->hasMetadataFor($fqn)) {
$this->_em->getMetadataFactory()->setMetadataFor($fqn, $this->_em->getClassMetadata($className));
}

$entityPersister = $this->_em->getUnitOfWork()->getEntityPersister($className);

return new $fqn($entityPersister, $identifier);
}

/**
* Generates proxy classes for all given classes.
* Used for pre-generation from command line, in case PHP on the hosting service
* has not the rights to create new files in the proxies folder.
*
* @param array $classes The classes (ClassMetadata instances) for which to generate proxies.
* @param string $toDir The target directory of the proxy classes. If not specified, the
* directory configured on the Configuration of the EntityManager used
* by this factory is used.
*/
public function generateProxyClasses(array $classes, $toDir = null)
{
$proxyDir = $toDir ?: $this->_proxyDir;
$proxyDir = rtrim($proxyDir, DIRECTORY_SEPARATOR) . DIRECTORY_SEPARATOR;
foreach ($classes as $class) {
$proxyClassName = str_replace('\\', '', $class->name) . 'Proxy';
$proxyFileName = $proxyDir . $proxyClassName . '.php';
$this->_generateProxyClass($class, $proxyClassName, $proxyFileName, self::$_proxyClassTemplate);
}
}

/**
* Generates a proxy class file.
* Substitutes certain parameters like class name and methods in a template
* kept at the end of this file. The class source code is saved in a file in the directory specified.
* Testing this method is usually not a problem since the directory is easily configurable.
*
* @param $class
* @param $originalClassName
* @param $proxyClassName
* @param $file The path of the file to write to.
*/
private function _generateProxyClass($class, $proxyClassName, $fileName, $file)
{
$methods = $this->_generateMethods($class);
$sleepImpl = $this->_generateSleep($class);

$placeholders = array(
'<namespace>',
'<proxyClassName>', '<className>',
'<methods>', '<sleepImpl>'
);

if(substr($class->name, 0, 1) == "\\") {
$className = substr($class->name, 1);
} else {
$className = $class->name;
}

$replacements = array(
$this->_proxyNamespace,
$proxyClassName, $className,
$methods, $sleepImpl
);

$file = str_replace($placeholders, $replacements, $file);

file_put_contents($fileName, $file);
}

/**
* Generates the methods of a proxy class.
* All methods are overridden to call _load() before execution.
*
* @param ClassMetadata $class
* @return string The code of the generated methods.
*/
private function _generateMethods(ClassMetadata $class)
{
$methods = '';

foreach ($class->reflClass->getMethods() as $method) {
/* @var $method ReflectionMethod */
if ($method->isConstructor() || strtolower($method->getName()) == "__sleep") {
continue;
}

if ($method->isPublic() && ! $method->isFinal() && ! $method->isStatic()) {
$methods .= PHP_EOL . ' public function ';
if ($method->returnsReference()) {
$methods .= '&';
}
$methods .= $method->getName() . '(';
$firstParam = true;
$parameterString = $argumentString = '';

foreach ($method->getParameters() as $param) {
if ($firstParam) {
$firstParam = false;
} else {
$parameterString .= ', ';
$argumentString .= ', ';
}

// We need to pick the type hint class too
if (($paramClass = $param->getClass()) !== null) {
$parameterString .= '\\' . $paramClass->getName() . ' ';
} else if ($param->isArray()) {
$parameterString .= 'array ';
}

if ($param->isPassedByReference()) {
$parameterString .= '&';
}

$parameterString .= '$' . $param->getName();
$argumentString .= '$' . $param->getName();

if ($param->isDefaultValueAvailable()) {
$parameterString .= ' = ' . var_export($param->getDefaultValue(), true);
}
}

$methods .= $parameterString . ')';
$methods .= PHP_EOL . ' {' . PHP_EOL;
$methods .= ' $this->_load();' . PHP_EOL;
$methods .= ' return parent::' . $method->getName() . '(' . $argumentString . ');';
$methods .= PHP_EOL . ' }' . PHP_EOL;
}
}

return $methods;
}

/**
* Generates the code for the __sleep method for a proxy class.
* The __sleep() method is used in case of serialization, which should
* not include service objects referenced by proxies like the Entity Manager.
*
* @param $class
* @return string
*/
private function _generateSleep(ClassMetadata $class)
{
$sleepImpl = '';

if ($class->reflClass->hasMethod('__sleep')) {
$sleepImpl .= 'return parent::__sleep();';
} else {
$sleepImpl .= 'return array(';
$first = true;

foreach ($class->getReflectionProperties() as $name => $prop) {
if ($first) {
$first = false;
} else {
$sleepImpl .= ', ';
}

$sleepImpl .= "'" . $name . "'";
}

$sleepImpl .= ');';
}

return $sleepImpl;
}

/** Proxy class code template */
private static $_proxyClassTemplate =
'<?php

namespace <namespace>;

/**
* THIS CLASS WAS GENERATED BY THE DOCTRINE ORM. DO NOT EDIT THIS FILE.
*/
class <proxyClassName> extends \<className> implements \Doctrine\ORM\Proxy\Proxy
{
private $_entityPersister;
private $_identifier;
public $__isInitialized__ = false;
public function __construct($entityPersister, $identifier)
{
$this->_entityPersister = $entityPersister;
$this->_identifier = $identifier;
}
private function _load()
{
if (!$this->__isInitialized__ && $this->_entityPersister) {
$this->__isInitialized__ = true;
if ($this->_entityPersister->load($this->_identifier, $this) === null) {
throw new \Doctrine\ORM\EntityNotFoundException();
}
unset($this->_entityPersister);
unset($this->_identifier);
}
}

<methods>

public function __sleep()
{
if (!$this->__isInitialized__) {
throw new \RuntimeException("Not fully loaded proxy can not be serialized.");
}
<sleepImpl>
}
}';
}
Published at DZone with permission of Giorgio Sironi, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)