When doing DOM-based performance testing you frequently need to pick a sample HTML document to work against. This raises the question: What is a good, representative, HTML document?
For many people a good document seems to file into one of two categories:
- A large web page with a lot of content. When we did our initial performance testing with jQuery we used Shakespeare's As You Like It (lots of elements, but a very flat structure) - Mootools uses an old draft of the W3C CSS3 Selectors recommendation. This has a lot of content, as well - thousands of elements with a medium depth structure.
- A popular web page. Common recommendations include 'yahoo.com' and 'microsoft.com'.
What's troubling is that there doesn't really seem to be any way to determine what a representative web page actually is. There's a couple things that I'd like to propose as being good indicators:
- Standards-based semantic markup (including strong use of attributes: id, class, etc.).
- Non-trivial file size and element count (testing the scalability of the performance).
- Some use of tables and form elements (frequent inclusions in most web pages).
- Strong use of CSS (frequently implies a deep element structure, in order to accommodate complex layouts).
- Pervasive use of JavaScript. If JavaScript performance analysis is being done it's probably good to start with a page that already has a desire to use it.
I think, out of all of these aspects, one page stands out: CNN.com . Here's why:
- It uses semantic HTML 4 markup with a lot of classnames and ids.
- It's about 92kb in size and has 1648 elements in it.
- It has some tables (seemingly for legacy material) and some forms (search forms, drop-downs).
- Lots of CSS and JavaScript. Makes good use of Prototype which, at least, should show an desire in having a page with performant JavaScript.
- It's, also, imperfect. I would consider this to be desirable. Rarely are pages completely without fault - and the heavy use of embedded JavaScript, ads, and non-semantic tables helps to add a stark dash of reality.
Of course, analysis could always be done against multiple pages and be viewed in aggregate but we need a place to start. So, what do you think; is CNN a good, representative, page for doing performance analysis against?
