One of the stumbling blocks for working out what the effects on the accessibility of new (and old) HTML5 features is not having any publicly accessible usage data. It is difficult without data to argue for the inclusion of features in HTML5 or working out how features should be accessibility supported. I have made an initial attempt to rectify this by collecting the HTML content of the home pages of the the top 10,000 web sites.I spent most of the Easter long weekend collecting the HTML pages. The original source for the “top 10,000” sites URLs was from this URL list I found on paste bin. I used HTTrack website copier to capture the HTML files. The initial pass was somewhat effected by redirects, so I went through the error log and collected a second list of URLs from the captured pages that had resulted in “page has moved” files. The resulting 8915 HTML pages are a result of the 2 sets of URLs. The HTML content (including URL lists) is provided as a zip file:
Top 10000 HTML files zip file – 121 MB (Please only download if you are going to make use of the data)
hgroup element usage
I have only just started to analyse the data. The first analysis is of the new HTML5 hgroup element and this is as yet only a simple gathering of instances of its use. No attempt has been made as yet, for example, to analyse what percentage of its use conforms to HTML5 author conformance requirements.
Of the top 8915 HTML pages, 79 (0.89%) were found to include use of the HTML5
hgroup element. A total of 418 instances of the
hgroup were found within the 79 pages.
Inclusion of hgroup in HTML5
Note: I am a proponent of the removal and/or replacement of
hgroup in HTML5, there are currently 5 change proposals being reviewed by the W3C HTML working group chairs on this subject:
- Change Proposal: replace hgroup with the subline element
- Change Proposal: no-change hgroup
- Change Proposal: replace hgroup with a simple element
- Change Proposal: remove hgroup add an outlineMask attribute
- Change Proposal: Replace <hgroup> with an element that has a simple content model and backwards compatibility.