An Overview of the W3C HTML5 Document Outliner Algorithm
Building the Right Environment to Support AI, Machine Learning and Deep Learning
HTML4's use of div and header tags to describe a document's structure has many limitations. First, the div tag acts as a generic block level division. That's all fine and well until you start to have nested DIVs. Moreover, without a descriptive ID or class attribute, there's no way to know whether its function is primarily one of presentation style or semantics.
With regards to headers, it is not possible to describe a subtitle or secondary title in HTML4. Since every section is part of the document outline, there is no way to define a section containing information related to the site as a whole, like logos, menus, table of contents, etc. HTML5 now introduces several new elements to describe the structure of a web document with a standard semantics.
This article specifically focuses on HTML5's Header and Section elements and describes how to use them to define the desired outline for your documents.
Why New Tags?
The definitive new structural elements include header, hgroup, article, section, aside, footer, and nav. Others have still not solidified their standing in the spec. As new tags, they are meant to compliment the existing DIV and header tags and not replace them. Their role is to help organize our page content according to what the content is. Hence, it is not about "where" the content goes on the page, but rather "what" the relationship is between the content with respect to other page content.
When dealing with existing content, start with a bird's eye view of the page and gradually work your way to more internal content. A good starting place is to divide content by sections and headers.
The Document Outline
Traditionally, a document's outline has been defined by the section headers, whereby the first element of heading content (and tag) in an element of sectioning content represents the heading for that section. Subsequent headings of equal or higher rank start new (implied) sections, headings of lower rank start implied subsections that are part of the previous one. In both cases, the element represents the heading of the implied section.
Sounds simple enough, however, the inclusion of the newtag complicates matters a bit, because the rank of a section depends partially on how the tag is used. For instance, section elements are always considered subsections of either their nearest parent section tag or their nearest ancestor section element, depending on which of the two is closest. This outline also supersedes any implied sections other headings may have created. The lesson here is that it's best to explicitly wrap sections in elements of sectioning content, and not just rely on the implicit sections generated by the headings.
As a general rule, the W3C strongly recommends to use only H1 elements, or to use the header of the appropriate rank for the section's nesting level.
The boolean argument to asHTML() tells the function whether or not we want links to the sections in the document. Here is a portion of the HTML produced (I added the indentations):
<ol> <li><a href="#h5o-1"><em>No text content inside H3</em></a></li> <li><a href="#h5o-2">Two dead in Virginia Tech shooting, suspect on loose</a></li> <li><a href="#h5o-3">Attawapiskat consultant to be paid $180,000</a></li> <li><a href="#h5o-4">Gallery: Actors with the most bang for the buck</a></li> <li><a href="#h5o-5">Deer and ram in love pose ethical dilemma</a></li> <li><a href="#h5o-6">Gallery: Crazy cool Christmas lights</a></li> <li><a href="#h5o-7">Virginia Tech</a></li> <li><a href="#h5o-8">Attawapiskat</a></li> <li><a href="#h5o-9">Profitable Actors</a></li> <li><a href="#h5o-10">Science</a></li> <li><a href="#h5o-11">Christmas lights</a><ol> <li><a href="#h5o-12">Headlines</a></li> <li><a href="#h5o-13">Canal Killings</a></li> </ol></li> <li><a href="#h5o-14">'My children did a lot of cruelty toward me': Shafia</a><ol> <li><a href="#h5o-15">Virginia Tech</a></li> </ol></li> <li><a href="#h5o-16">2 dead in Virginia Tech shooting, suspect on loose</a><ol> <li><a href="#h5o-17">Albert Pujols</a></li> </ol></li> <li><a href="#h5o-18">Albert Pujols heading to Angels</a><ol> <li><a href="#h5o-19"><em>No text content inside H3</em></a></li> </ol></li> <li><a href="#h5o-20">Markets drop as ECB disappoints</a><ol> <li><a href="#h5o-21">Today's Photos »</a></li> <li><a href="#h5o-22">Popular Links</a></li> <li><a href="#h5o-23">The Daily Bright »</a></li> <li><a href="#h5o-24">MythBusters</a></li> </ol></li> <li><a href="#h5o-25">Video: Cannonball hits home in 'MythBusters' TV shoot</a><ol> <li><a href="#h5o-26">Border deal</a></li> </ol></li> <li><a href="#h5o-27">What Canada-U.S. border deal means</a><ol> <li><a href="#h5o-28">Television</a></li> </ol></li> <li><a href="#h5o-29">Will Ryan Seacrest replace Matt Lauer?</a><ol> <li><a href="#h5o-30">Metallica</a></li> </ol></li> <p>...</p> </ol>
...which renders the following in a browser:
- No text content inside H3
- Two dead in Virginia Tech shooting, suspect on loose
- Attawapiskat consultant to be paid $180,000
- Gallery: Actors with the most bang for the buck
- Deer and ram in love pose ethical dilemma
- Gallery: Crazy cool Christmas lights
- Virginia Tech
- Profitable Actors
- Christmas lights
- 'My children did a lot of cruelty toward me': Shafia
- 2 dead in Virginia Tech shooting, suspect on loose
- Albert Pujols heading to Angels
- Markets drop as ECB disappoints
- Video: Cannonball hits home in 'MythBusters' TV shoot
- What Canada-U.S. border deal means
- Will Ryan Seacrest replace Matt Lauer?
Don't be surprised if the links don't do anything--the content that they refer to is not in this document, and even if they were, the function is non-obtrusive in that it only links to IDs that already exist in the document. Hence, if the Metallica section possessed an ID attribute, the link would point to it. Since it does not, the H5O algorithm generates its own, but does not insert it into the DOM. The generated link ID is in the format of "'h5o-'+ (++linkCounter)", giving Metallica a section ID is #h5o-30.
The inclusion of links certainly is a great tool for generating a table of contents. Just be sure to assign IDs to each section if you want to save yourself a bit of work.
Sections that do not contain a child heading will be labeled as an "Untitled" section, as to still preserve the outline, as seen in the code below:
<h1>Shared Web Workers Help Spread the News</h1> <p>After being a fixture in languages like Java for years, Web Workers have now made multi-threading in Web applications a reality. Right now, they are supported as of Opera 10.6, Safari 4.0, Chrome 11.0, Firefox 4.0 and are expected to be included in IE 10…</p> <h2>The Difference between the Two</h2>
...which renders the following in a browser:
Rules for Header Groups
The outliner will disregard all headings within except for the one with the highest ranking. For example, if it contains an <h1>, an <h2>, and an <h3>, only the <h1>'s text will be used as the section title in the outline. Thus:
<h1>RobGravelle.com</h1> <h2>It's all about the music.</h2> <h2>News</h2> <p>Rob voted best guitarist of 2011…by his wife.</p>
...would produce the following outline:
For more information, please visit the W3C markup specification.
Limitations of HTML5 Document Sectioning
One thing that HTML5 does not include is a mechanism that would allow semantic information to be added to a document as required. So for the time being, we'll have to make the current set of new tags work for us. Hopefully these will eventually evolve into a quasi-language of their own, much like CSS did. If and when that ever happens, the HTML5 Document Sections will live up to their promise to do something similar with page semantics just as CSS radically changed how we defined the look of our web pages.
IT Solutions Builder TOP IT RESOURCES TO MOVE YOUR BUSINESS FORWARD
Which topic are you interested in?
What is your company size?
What is your job title?
What is your job function?
Searching our resource database to find your matches...