From the Metaweb to the Semantic Web: A Roadmap

In previous articles on this Weblog I have suggested that we name the new evolution of the Web that is emerging from the confluence of Weblogging and RSS, “The Metaweb.” The Metaweb is a meta-data driven Web of microcontent. We can see it emerging and chart its growth by looking at technorati and daypop for example. The Metaweb is happening today, it is real. You are browsing it now by reading this page.

I believe that the Metaweb is the first step in the evolution of the coming Semantic Web. The Semantic Web is a Web of ontologically-defined information. Ontologies are formal systems of concepts that can be used to rigorously define what things mean and how they relate. So for example, an ontology about cameras would define basic concepts about cameras such as “lens,” “viewfinder,” “film,” “tripod,” “zoom lens,” “shutter speed,” etc.

By linking content about cameras to the appropriate definitions in the camera ontology, it then becomes possible for software to do a better job of understanding what the content means. That’s the first goal of the Semantic Web — simply adding more semantics to information so that it can be understood better by machines (and people). This can be done today.

The second goal of the Semantic Web is to enable software to think more intelligently about information by providing a formal means to express and derive abstract logical relationships, inferences and proofs, and arbitrary formal statements about information. This can be done today too, but to do it well requires artificial intelligence. The first goal of the Semantic Web — semantic metadata — is near-term, the second goal — intelligent information processing — is long-term. The point of this article is that the Metaweb is the first step towards achieving both these goals.

It all starts with RSS, in my opinion.

RSS is a metadata format for publishing and subscribing to metacontent objects, the units of the Metaweb. RSS (in various flavors and soon in Atom, a new open standard based on RSS) is already in wide use on Weblogs and content syndication sites today. Numerous large and small organizations and content providers publish and subscribe to RSS as a means to exchange and track ideas.

The first step in the process of evolving the Semantic Web is to bring about widespread adoption of the Metaweb — of weblogs, RSS, Atom and other emerging microcontent media. As microcontent begins to play an increasingly important role on the Web, and in our personal and work lives, it will set the stage for the gradual introduction of ever-richer microcontant formats and protocols, eventually leading to full Semantic Web microcontent.

Existing microcontent standards such as RSS are extremely barebones, and the emerging Atom spec looks to be no less lightweight so far. There are many problems with RSS and Atom — chief among them in my opinion is that while they are extensible there is really no easy way to make use of extensions, and secondly, they do not provide semantically defined metatags. Anyone is free to extend such formats with whatever custom metatags they want to put in, but currently there is no way to instantly make those metatags useful in applications that were not written specifically to recognize them, nor is there a way to semantically define the meaning of those tags so that software can understand how to interpret them without human intervention.

So the next step after the widespread adoption of metacontent standards like RSS and Atom is to add support for pervasively and ubiquitously extending the formats and also for putting more semantics into microcontent. We are working on these problems at Radar Networks.

In order to add rich semantics to microcontent (or any content for that matter), there needs to be a formally defined semantics in the first place. This brings us to the subject of ontologies. Ontologies, as I have explained earlier, are formal conceptual models. They define systems of concepts.

There are a number of ontologies in existence today, however for the most part they are either too high-level and abstract or too vertical and specific to be of much use to the average Web surfer. For example, the SUMO ontology provides a good Upper Ontology that defines abstract concepts such as various units of measurement, various types of common entities, and common relationships among them. OpenCYC is another ontology that focuses mainly on “common sense knowledge” — such as concepts related to shopping or social relationships, places, etc. Other ontologies are more vertical — for instance DARPA has funded the development of a number of ontologies that provide knowledge related to warfare. The NIH has funded work on ontologies related to medicine. But there is no ontology that provides good semantic definitions of the kinds of things that ordinary consumers and knowledge workers deal with.

At Radar Networks we have been working to define this ontology — which we call “The Infoworker Ontology” — with a goal of evententually contributing it to a standards body in the future. The Infoworker Ontology is a mid-level horizontal ontology that defines the semantics of common entities and relationships in the domain of knowledge work — things like documents, events, projects, tasks, people, groups, etc. The development and adoption of an open, extensible, and widely-used Infoworker ontology is a necessary step towards making the Semantic Web useful to ordinary mortals (as opposed to academic researchers).

By connecting microcontent objects to the Infoworker Ontology a new generation of semantic-microcontent (what we call “metacontent”) is enabled. With the right tools even non-technical consumers will be able to author and use metacontent.

It is at this point that the Metaweb begins to evolve into the Semantic Web: The moment when someone adds semantics to microcontent in a manner that everyone can use. This is what we have done at Radar Networks. But to do it right is non-trivial: a number of incredibly complex and subtle issues must be solved.

After 3 years of working on this problem we are confident that we have the right approach. In future months I will begin to describe our approach on this Weblog. Stay tuned!

9 thoughts on “From the Metaweb to the Semantic Web: A Roadmap”

  1. From the Metaweb to the Semantic Web: A Roadmap

    In previous articles on this Weblog I have suggested that we name the new evolution of the Web that…

  2. PS. It just occurred to me that your ontology may not RDF/OWL-based. It also occurred to me that there are existing ontologies that cover the areas you mention: documents [Dublin Core], events [iCal], projects, tasks [http://purl.org/stuff/project#] , people, groups [FOAF].
    How then does it differ from these?
    PPS. I’ve just had my agents noseying around, and spoken to a few bots and it doesn’t look like our paths have crossed before. So I’d better formally say “hi!”, I’ve been working around the same area for the past few years.

  3. It seems to me that the greatest benefit of a a standard “metaweb” ontology is simply the fact that it is a standard, that everyone is speaking the same langauge.
    I think that the ontological infrastructure that wins will be the one that can bootstrap itself. By this I mean an ontology that contains data pertenent to consensus building, such as: opinions on a specific topic, polls, reviews of standards, etc. For this to work a distributed reputation system I think would be mandatory, and not just a good idea.

  4. You are zeroing in on the problem when you say:
    “…there is really no easy way to make use of extensions, and secondly, they do not provide semantically defined metatags. Anyone is free to extend such formats with whatever custom metatags they want to put in, but currently there is no way to instantly make those metatags useful in applications that were not written specifically to recognize them, nor is there a way to semantically define the meaning of those tags so that software can understand how to interpret them without human intervention.”
    Ontologies may provide part of the answer, but only part and anyway how do you make ontologies interoperable and automatable by an undetermined application? You might want to take a look at what OMG is doing with MOF (Meta Object Facility). You may find concepts there that can be adapted to the MetaWeb.
    As you hint, what’s data and what’s metadata depends on the point of view. In MOF Level 1 Metadata is a model of real world data (Level 0). (I’m using metadata in the sense of information that defines and controls the meaning and structure of data – not descriptive metadata which is really just another kind of data.) Just as metadata makes data “smart”, there needs to be a mechanism to make metadata smart – i.e. to model the metadata – to define and control its meaning and structure so it can be automated. This is where the Level 2 Meta Model comes in – a way to define the language of the metadata.
    Finally you need a way to model the meta model (Level 3 or Meta Meta Model) so that your models and meta models can be interchanged with people and applications using a different modeling approach (e.g. relational database).
    This (or a similar – maybe with additional dimensions) 4 level meta-structure concept (combined with additional ideas from XML registries) might be partial keys to realizing the full potential for interoperability across metaweb, databases, applications (services) and unstructured content. (stealth radar will detect big blue whales in this vicinity.)

  5. a cumulative Web 2.0 definition …

    … quite provisory, of course. Mainly based on the proto-definition work of richard mcmanus’ writereadweb, still the most important resource for Web 2.0. The wikipedia-entry is also valuable, despite being “disputed”. At present I see five characteris…

Comments are closed.