HTML5 Zone is brought to you in partnership with:

Ariya is a passionate engineer interested in bleeding-edge technologies. He has been involved in various large projects, from KDE to WebKit. These days, his focus is mostly on software craftsmanship around web technologies. His (little) spare time is spent running the projects PhantomJS (headless WebKit) and Esprima (JavaScript parser). Ariya is a DZone MVB and is not an employee of DZone and has posted 53 posts at DZone. You can read more from them at their website. View Full User Profile

Capturing a Web Page Without Stylesheets

06.11.2013
| 2765 views |
  • submit to reddit

It is amazing to live in an environment where the Internet connection is ubiquitous and fast. But in case the tube is having a problem and the bits from the web server are broken into random pieces, how does the web site look like? If the content degrades gracefully, the lack of style sheets may reduce the attractiveness of the page but it should not significantly hamper the experience. Fortunately, there is a way to automatically check the appearance of a web page under that circumstance.

Some time ago, I have demonstrated the use of PhantomJS, headless WebKit, to capture web pages programmatically. The example was also extended to capture just a particular portion of the page via clipping. For CSS-less capture, we just need to extend it with the new feature in PhantomJS 1.9 (as implemented by Vitaliy Slobodin): the ability to abort network requests.

There is a example loadurlwithoutcss.js which demonstrates this feature. In fact, combining this idea with the previous BBC News site capture, we can come up with the following screenshots. The left side shows the normal page (see my previous blog post on web clipping) while the right side demonstrates what happens when all the CSS files are not loaded at all.

decssify

The script which produces the above image is as follows:

var page = require('webpage').create();
page.settings.userAgent = 'WebKit/534.46 Mobile/9A405 Safari/7534.48.3';
page.settings.viewportSize = { width: 400, height: 600 };
 
page.onResourceRequested = function(requestData, request) {
    if ((/http:\/\/.+?\.css$/gi).test(requestData['url'])) {
        console.log('Skipping', requestData['url']);
        request.abort();
    }   
};
 
page.open('http://m.bbc.co.uk/news/health', function (status) {
    if (status !== 'success') {
        console.log('Unable to load BBC!');
        phantom.exit();
    } else {
        window.setTimeout(function () {
            page.clipRect = { left: 0, top: 0, width: 400, height: 600 };
            page.render('bbc_unstyled.png');
            phantom.exit();
        }, 1000);
    }   
});

It is pretty similar to its previous version. The new addition is a handler for onResourceRequested where we detect the URL for a style sheet and abort its loading. If the script is executed, it will display the message:

Skipping http://static.bbci.co.uk/frameworks/barlesque/2.45.9/mobile/3.5/style/main.css
Skipping http://static.bbci.co.uk/bbcdotcom/0.3.184/style/mobile/bbccom.css
Skipping http://static.bbci.co.uk/news/1.7.1-259/stylesheets/core.css
Skipping http://static.bbci.co.uk/news/1.7.1-259/stylesheets/compact.css

which indicates that these 4 (four) style sheets won’t be part of the rendered output.

The entire process is rather straightforward. Because PhantomJS is cloud-ready, you can even have it running on an instance of Amazon EC2. It should not be too difficult to include this type of spartan rendering of your web site as another layer in the defensive development workflow.

What do you plan to de-CSS-ify today?




Published at DZone with permission of Ariya Hidayat, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)