Did you know? DZone has great portals for Python, Cloud, NoSQL, and HTML5!

Matthew Turland is a Senior Engineer at Synacor where he develops internet solutions with a variety of technologies. He began working with PHP in 2002 and went on to publish the book "Web Scraping with PHP." In his spare time, he leads development on the Phergie project and serves as an organizer for the Acadiana Open Source Group. Matthew has posted 10 posts at DZone. You can read more from them at their website. View Full User Profile

Renaming a DOMNode in PHP

09.15.2011
Email
Views: 2217
  • submit to reddit

A recent work assignment had me using PHP to pull HTML data into a DOMDocument instance and renaming some elements, such as b to strong or i to em. As it turns out, renaming elements using the DOM extension is rather tedious.

Version 3 of the DOM standard introduces a renameNode() method, but the PHP DOM extension doesn’t currently support it.

The $nodeName property of the DOMNode class is read-only, so it can’t be changed that way.

A node can be created with a different name in the same document, but if you specify a value to go along with it, any entities in that value are automatically encoded, so it’s not possible to pass in the intended inner content of a node if it contains other nodes.

The only method I’ve found that works is to replicate the attributes and child nodes of the original node. Attributes are fairly easy, but I ran into an issue replicating children where only the first child of any given node was replicated within its intended replacement and the remaining children were omitted. Here’s the original code that was exhibiting this behavior.

foreach ($oldNode->childNodes as $childNode) {
    $newNode->appendChild($childNode);
}

The reason for this behavior is that the $childNodes property of $oldNode is implicitly modified when $childNode is transferred from it to $newNode, so the internal pointer of $childNodes to the next child in the list is no longer accurate.

To get around this, I took advantage of the fact that any node with any child nodes will always have a $firstChild property pointing to the first one. The modified code that takes this approach is below and has the behavior I originally set out to implement.

while ($oldNode->firstChild) {
    $newNode->appendChild($oldNode->firstChild);
}

If you’re curious, below is the full code segment for renaming a node.

$newNode = $oldNode->ownerDocument->createElement('new_element_name');
if ($oldNode->attributes->length) {
    foreach ($oldNode->attributes as $attribute) {
        $newNode->setAttribute($attribute->nodeName, $attribute->nodeValue);
    }
}
while ($oldNode->firstChild) {
    $newNode->appendChild($oldNode->firstChild);
}
$oldNode->ownerDocument->replaceChild($newNode, $oldNode);

Another potential “gotcha” is the argument order of the replaceChild() method, which is the new node followed by the old node rather than the reverse that most people might expect. Thanks to Joshua May for pointing that one out to me; I might never have understood why I was getting a “Not Found Error” DOMException otherwise.

References
Published at DZone with permission of its author, Matthew Turland. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Nabeel Manara replied on Fri, 2012/01/27 - 11:38am

The last line was throwing Not Found exceptions for me until I switched out ownerDocument for parentNode – I’m guessing this could depend on whether the node you’re replacing is top level or not.

So, line 10: $oldNode->parentNode->replaceChild($newNode, $oldNode);

Now if only PHP would actually roll out some core extensions that fully implemented the things they said they were implementing. Getting pretty sick of it these days.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.