HTML5 Zone is brought to you in partnership with:

Raymond Camden is a developer evangelist for Adobe. His work focuses on web standards, mobile development and Cold Fusion. He's a published author and presents at conferences and user groups on a variety of topics. He is the happily married proud father of three kids and is somewhat of a Star Wars nut. Raymond can be reached via his blog at www.raymondcamden.com or via email at raymondcamden@gmail.com Raymond is a DZone MVB and is not an employee of DZone and has posted 243 posts at DZone. You can read more from them at their website. View Full User Profile

Reading Microdata Elements in Chrome

12.05.2012
| 2825 views |
  • submit to reddit

Before going any further, please note this blog post definitely falls into the "questionable" category. Please read the following with a large grain of salt (and a cold beer at your side). I've read a few articles recently on microdata. Today I read another good one here: Make Your Page Consumable by Robots and Humans Alike With Microdata.

The concept is rather simple. By embedding a bit of metadata into your code, you make your pages have machine-readable context. This is a bit like data attributes, but in my mind a bit different. Data attributes are, in my opinion, useful for data in a self-contained manner. Ie, you mark up your pages so your code (JavaScript or CSS) can do something with it. Microdata is for external consumers. Mixed with external schemas this could be pretty powerful. Apparently Google is already using this so it has some SEO value as well.

I got even more interested when I saw there was a DOM API for it: document.getItems(). This would, supposedly, return all the microdata items in your current document. Unfortunately, this failed in Chrome. Surprisingly, CanIUse.com failed to report on the API and I had to dig a bit more to find that - apparently - only Firefox and Opera support this API at the moment.

I wanted to build something that would a) notice if microdata was in use and b) report how it was used. I knew I could get, and iterate, over all the items in the DOM but I assumed that would be rather wasteful. Then I discovered the document.evaluate function. This allows you to use XPath to search the DOM. So with that at my disposal, I first created a function that would check for the existence of any microdata in use:

function hasItems() {
    return document.evaluate("count(/html/body//*[@itemscope])", document, null, XPathResult.NUMBER_TYPE, null).numberValue > 0;
}

If you didn't read the article I linked to before, the use of a itemscope as an attribute "wraps" DOM items that are considered one logical unit of microdata. My XPath simply looks for this and runs a count() operation to get the number of items that match.

I then wrote a function that would return these items. For the most part, this is a simple matter of iterating over XPath results and using DOM functions to get values, but you have to use a bit of logic based on what type of DOM node you're dealing with. So for example, if an Anchor tag is used for a property, then the microdata value is sourced by the href attribute. For most other things you simply use the inner text. Here's my getItems function (and yes, that name is too generic):

function getItems() {
    var items = document.evaluate("/html/body//*[@itemscope]", document, null, XPathResult.ANY_TYPE, null); 
    var results = [];
    var result = items.iterateNext();
    while(result) {
        var kids = document.evaluate(".//*[@itemprop]", result, null, XPathResult.ANY_TYPE, null); 
        var item = {};
        var kidprop = kids.iterateNext();
        while(kidprop) {
            var attr = kidprop.attributes.getNamedItem("itemprop");
            //To get the value, it depends on the type
            var value="";
            switch(kidprop.nodeName) {
                case "AREA":
                case "LINK":
                case "A":
                    value = kidprop.href;
                    break;

                case "AUDIO":
                case "EMBED":
                case "IFRAME":
                case "IMG":
                case "SOURCE":
                case "VIDEO":
                    value = kidprop.src;
                    break;

                default: 
                    value = kidprop.innerText;
                    break;
            }
            item[attr.nodeValue] = value;
            kidprop = kids.iterateNext();
        }

        results.push(item);
        result = items.iterateNext();
    }
    return results;
}

I used some source HTML based on the article I linked to earlier:

<ul>
<li itemscope>
    <ul>
        <li>Name: <span style="foo" itemprop="name2">Fred</span></li>
        <li>Name: <span itemprop="name">Fred</span></li>
        <li>Phone: <span itemprop="telephone">210-555-5555</span></li>
        <li>Email: <span itemprop="email">thebuffalo@rockandstone.com</span></li>
        <li>Site: <a href="foo.html" itemprop="url">My site</a></li>
    </ul>
</li>
<li itemscope>
    <ul>
        <li>Name: <span itemprop="name">Wilma</span></li>
        <li>Phone: <span itemprop="telephone">210-555-7777</span></li>
        <li>Email: <span itemprop="email">thewife@rockandstone.com</span></li>
    </ul>
</li>
<li itemscope>
    <ul>
        <li>Name: <span itemprop="name">Betty</span></li>
        <li>Phone: <span itemprop="telephone">210-555-8888</span></li>
        <li>Email: <span itemprop="email">theneighbour@rockandstone.com</span></li>
    </ul>
</li>
<li itemscope>
    <ul>
        <li>Name: <span itemprop="name">Barny</span></li>
        <li>Phone: <span itemprop="telephone">210-555-0000</span></li>
        <li>Email: <span itemprop="email">thebestfriend@rockandstone.com</span></li>
    </ul>
</li>
</ul>

When I execute my JavaScript against this, I get:

Useful? Not sure yet. I assume, eventually, Chrome will get the native API anyway. (Although in Firefox it returns the Node items, not a nice array like I've got, unless I'm using it wrong it looks like there may still be a need for a utility function.)




Published at DZone with permission of Raymond Camden, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)