- Image via Wikipedia
Semantic Data Extractor
Every so often, someone writes to me or to the public-qa-dev mailing list to report bugs, or simply to give thanks on the semantic data extractor.
I’m always pleasantly surprised when I hear that, what started as a 10 minutes demonstrator of the semantics attached to HTML, is actually used as a tool by a number of developers.
With a name such “semantic data extractor”, it was a bit of a shame that the tool didn’t highlight the usage of GRDDL or RDFa on pages that use either of these technologies; I have just added detection of both of these to the extractor.
As a bonus, I have also added detection of non-semantic markup: at this time, it will detect purely-wrapping <div>
, empty <span>
, and tables with a single row or a single column (which have good chances to be layout tables); if you have suggestions for detecting other non-semantic