I’m posting this in response to a recent post by Tim O’Reilly which focused on disambiguating what the Semantic Web is and is not, as well as the subject of Collective Intelligence. I generally agree with Tim’s post, but I do have some points I would add by way of clarification. In particular, in my opinion, the Semantic Web is all about collective intelligence, on several levels. I would also suggest that the term "hyperdata" is a possibly useful way to express what the Semantic Web is really all about.
What Makes Something a Semantic Web Application?
I agree with Tim that the term "Semantic Web" refers to the use of a particular set of emerging W3C open standards. These standards include RDF, OWL, SPARQL, and GRDDL. A key requirement for an application to have "Semantic Web inside" so to speak, is that it makes use of or is compatible with, at the very least, basic RDF. Another alternative definition is that for an application to be "Semantic Web" it must make at least some use of an ontology, using a W3C standard for doing so.
Semantic Versus Semantic Web
Many applications and services claim to be "semantic" in one manner or another, but that does not mean they are "Semantic Web." Semantic applications include any applications that can make sense of meaning, particularly in language such as unstructured text, or structured data in some cases. By this definition, all search engines today are somewhat "semantic" but few would qualify as "Semantic Web" apps.
The Difference Between "Data On the Web" and a "Web of Data"
The Semantic Web is principally about working with data in a new and hopefully better way, and making that data available on the Web if desired in an open fashion such that other applications can understand and reuse it more easily. We call this idea "The Data Web" — the notion is that we are transforming the Web from a distributed file server into something that is more like a distributed database.
Instead of the basic objects being web pages, they are actually pieces of data (triples) and records formed from them (sets, trees, graphs or objects comprised of triples). There can be any number of triples within a Web page, and there can also be triples on the Web that do not exist within Web pages at all — they can come directly from databases for example.
One might respond to this by noting that there is already a lot of data on the Web, in XML and other formats — how is the Semantic Web different from that? What is the difference between "Data on the Web" and the idea of "The Data Web?"
The best answer to this question that I have heard was something that Dean Allemang said at a recent Semantic Web SIG in Palo Alto. Dean said, "Sure there is data on the Web, but it’s not actually a web of data." The difference is that in the Semantic Web paradigm, the data can be linked to other data in other places, it’s a web of data, not just data on the Web.
I call this concept of interconnected data, "Hyperdata." It does for data what hypertext did for text. I’m probably not the originator of this term, but I think it is a very useful term and analogy for explaining the value of the Semantic Web.
Another way to think of it is that the current Web is a big graph of interconnected nodes, where the nodes are usually HTML documents, but in the Semantic Web we are talking about a graph of interconnected data statements that can be as general or specific as you want. A data record is a set of data statements about the same subject, and they don’t have to live in one place on the network — they could be spread over many locations around the Web.
A statement to the effect of "Sue lives in Palo Alto" could exist on site A, refer to a URI for a statement defining Sue on site B, a URI for a statement that defines "lives in" on site C, and a URI for a statement defining "Palo Alto" on site D. That’s a web of data. What’s cool is that anyone can potentially add statements to this web of data, it can be completely emergent.
The Semantic Web is Built by and for Collective Intelligence
This is where I think Tim and others who think about the Semantic Web may be missing an essential point. The Semantic Web is in fact highly conducive to "collective intelligence." It doesn’t require that machines add all the statements using fancy AI. In fact, in a next-generation folksonomy, when tags are created by human users, manually, they can easily be encoded as RDF statements. And by doing this you get lots of new capabilities, like being able to link tags to concepts that define their meaning, and to other related tags.
Humans can add tags that become semantic web content. They can do this manually or software can help them. Humans can also fill out forms that generate RDF behind the scenes, just as filling out a blog posting form generates HTML, XML, ATOM etc. Humans don’t actually write all that code, software does it for them, yet blogging and wikis for example are considered to be collective intelligence tools.
So the concept of folksonomy and tagging is truly orthogonal to the Semantic Web. They are not mutually exclusive at all. In fact the Semantic Web — or at least "Semantic Web Lite" (RDF + only basic use of OWL + basic SPARQL) is capable of modelling and publishing any data in the world in a more open way.
Any application that uses data could do everything it does using these technologies. Every single form of social, user-generated content and community could, and probably will, be implemented using RDF in one manner or another within the next decade or so. And in particular, RDF and OWL + SPARQL are ideal for social networking services — the data model is a much better match for the structure of the data and the network of users and the kinds of queries that need to be done.
This notion that somehow the Semantic Web is not about folksonomy needs to be corrected. For example, take Metaweb’s Freebase. Freebase is what I call a "folktology" — it’s an emergent, community generated ontology. Users collaborate to add to the ontology and the knowledge base that is populated within it. That’s a wonderful example of collective intelligence, user generated content, and semantics (although technically to my knowledge they are not using RDF for this, their data model is from what I can see functionally equivalent and I would expect at least a SPARQL interface from them eventually).
But that’s not all — check out TagCommons and this Tag Ontology discussion, and also the SKOS ontology — all of which are working on semantic ways of characterizing simple tags in order to enrich folksonomies and enable better collective intelligence.
There are at least two other places where the Semantic Web naturally leverages and supports collective intelligence. The first is the fact that people and software can generate triples (people could do it by hand, but generally they will do it by filling out Web forms or answering questions or dialog boxes etc.) and these triples can live all over the Web, yet interconnect or intersect (when they are about the same subjects or objects).
I can create data about a piece of data you created, for example to state that I agree with it, or that I know something else about it. You can create data about my data. Thus a data-set can be generated in a distributed way — it’s not unlike a wiki for example. It doesn’t have to work this way, but at least it can if people do this.
The second point is that OWL, the ontology language, is designed to support an infinite number of ontologies — there doesn’t have to be just one big ontology to "rule them all." Anyone can make a simple or complex ontology and start to then make data statements that refer to it. Ontologies can link to or include other ontologies, or pieces of them, to create bigger distributed ontologies that cover more things.
This is kind of like not only mashing up the data, but also mashing up the schemas too. Both of these are examples of collective intelligence. In the case of ontologies, this is already happening, for example many ontologies already make use of other ontologies like the Dublin Core and Foaf.
The point here is that there is in fact a natural and very beneficial fit between the technologies of the Semantic Web and what Tim O’Reilly defines Web 2.0 to be about (essentially collective intelligence). In fact the designers of the underlying standards of the Semantic Web specifically had "collective intelligence" in mind when they came up with these ideas. They were specifically trying to rectify several problems in the closed, data-silo world of old fashioned databases. The big motivation was to make data more integrated, to enable applications to share data more easily, and to be able to build data with other data, and to build schemas with other schemas. It’s all about enabling connections and network effects.
Now, whether people end up using these technologies to do interesting things that enable human-level collective intelligence (as opposed to just software level collective intelligence) is an open question. At least some companies such as my own Radar Networks and Metaweb, and Talis (thanks, Danny), are directly focused on this, and I think it is safe to say this will be a big emerging trend. RDF is a great fit for social and folksonomy-based applications.
Web 3.0 and the concept of "Hyperdata"
Where Tim defines Web 2.0 as being about collective intelligence generally, I would define Web 3.0 as being about "connective intelligence." It’s about connecting data, concepts, applications and ultimately people. The real essence of what makes the Web great is that it enables a global hypertext medium in which collective intelligence can emerge. In the case of Web 3.0, which begins with the Data Web and will evolve into the full-blown Semantic Web over a decade or more, the key is that it enables a global hyperdata medium (not just hypertext).
As I mentioned above, hyperdata is to data what hypertext is to text. Hyperdata is a great word — it is so simple and yet makes a big point. It’s about data that links to other data. It does for data what hypertext does for text. That’s what RDF and the Semantic Web are really all about. Reasoning is NOT the main point (but is a nice future side-effect…). The main point is about growing a web of data.
Just as the Web enabled a huge outpouring of collective intelligence via an open global hypertext medium, the Semantic Web is going to enable a similarly huge outpouring of collective knowledge and cognition via a global hyperdata medium. It’s the Web, only better.