Doing for Data What HTML Did for Documents

I’ve been saying for quite some time that the first thing the Semantic Web will accomplish is “doing for data what HTML did for documents.” HTML made it possible for everyone to publish, access and connect documents on the Internet. RDF and OWL — the languages of the Semantic Web — make it possible to do the same thing for data records — structured fragments or collections of data. Today data records exist in a variety of non-integrated formats and standards. RDF and OWL provide a common language for publishing, accessing and linking data around the Web.

Tim Berners-Lee recently gave Congressional testimony on the future of the Web where he explained this as well:

“Digital information about nearly every aspect of ourlives is being created at an astonishing rate. Locked within all ofthis data is the key to knowledge about how to cure diseases, createbusiness value, and govern our world more effectively. The good news isthat a number of technical innovations (RDF which is to data what HTMLis to documents, and the Web Ontology Language (OWL) which allows us toexpress how data sources connect together) along with more openness ininformation sharing practices are moving the World Wide Web toward whatwe call the Semantic Web. … The Semantic Web will enable better dataintegration by allowing everyone who puts individual items of data onthe Web to link them with other pieces of data using standard formats.”

This first stage of evolution of the Semantic Web can be called the “data web” – a Web of connected data. Once this is in place and lots of RDF and OWL data records exist, and can be queried via SPARQL interfaces by any application, the Web will start to actually function like a giant globally distributed database.  This in turn will enable a new era of Web applications that understand the structure and interconnections of data around the Web. For example a search engine that could search for different kinds of product listings across many different sites that publish RDF/OWL data for their product records.

Today such a data-aware search engine is hard to make — each new site that is searched has to be integrated with in a completely custom way. This is sometimes referred to as the challenge of doing “deep Web search.” To solve this problem, if product listings of various types were at least mapped to a common ontology (or even a set of ontologies) this would make integration of disparate data types about products easier. By making it easier to operate across data around the Web, data-aware applications can  be made to do more sophisticated things with less effort, and they in turn will produce more data for the data-web in a virtuous-cycle. That is the first step. After this we can then begin to make smart applications that help to filter, make sense of, track, suggest, organize, data and assist users. First we have to have rich data — the data-web — then we can move to smart applications from there.

0 thoughts on “Doing for Data What HTML Did for Documents