The Ontology Integration Problem

The OWL language, and tools such as Protege and TopBraid Composer make it easy to design ontologies. But what about the problem of integrating disparate ontologies? I haven’t really found a good solution for this yet.

In my own experience designing a number of OWL ontologies (500 classes – 3000 classes on average) it has often been easier to create my own custom ontology branches to cover various concepts than to try to integrate other ontologies of those concepts into my own.

One of the reasons for this is that each ontology has it’s own naming conventions, philosophical orientation, domain nuances, design biases and tradeoffs, often guided by particular people and needs that drove their creation. Integrating across these different worldviews and underlying constraints is often hard. Simply stating that various classes or properties are equivalent is not necessarily a solution because thier inheritance may not in fact be equivalent and thus they may actually be semantically quite different in function, regardless of expressions of equivalence. OWL probably needs to be a lot more expressive in defining mappings between ontologies to truly resolve such subtle problems.

The alternative to mapping — importing external ontologies into your own — is also not great because it usually results in redundancies, as well as inconsistent naming conventions and points of view. As you keep adding colors to your pallete, it starts to become kind of brown. If the goal is to make ontologies that are elegant, easy to maintain, extend, understand and apply, importing ontologies into other ontologies doesn’t seem to be the way to accomplish that. Different ontologies usually don’t  fit together well, or even at all in some cases.

Because of the above problems, it is often easier to simply reinventthe wheel. Instead of trying to map between ontologies, or import otherontologies into one’s own ontology, it is usually easier to just writeeverything oneself. Of course this is not really as efficient as wemight like — it would certainly be great to be able to easily reuseother people’s ontologies in one’s own ontologies. But in practice thatis still very very hard.

And this is a problem, isn’t it? Because if the semantic web isreally going to take off we have to either find easy and effective waysto connect ontologies together, or we have to get everyone to use thesame ontology. Nobody has yet solved the problems of mapping andimporting ontologies well enough. And likewise so far nobody hassucceeded in making the uber-ontology and convincing everyone else touse it. In fact, I think it’s probably safe to say that the morecomprehensive and powerful an ontology is, the fewer the number ofpeople who will agree on it, let alone understand it well enough to useit.

The dream of the Semantic Web vision is that someday there will bethousands or millions of ontologies around the web, and millions ofinstances of them. And these will all somehow be integratedautomagically, or at least if they aren’t integrated on the semanticlevel, then there will be magic software that embodies thatintegration. In any case, the hope is that someday intelligent agentswill be able to freely and seamlessly roam around harvesting this data,squishing it together into knowledgebases, and reasoning across them.But neither harvesting, nor squishing, nor reasoning can really takeplace without some level of semantic integration of the underlyingontologies. Yet, how will all these disparate ontologies be connected?Unless mappings are created between them, instead of a Semantic Web,we’ll just have millions of little semantic silos. Maybe some companywill succed in making the biggest silo and that will be “the” semanticweb to most people. That might be the best solution in fact, but I’mnot sure that is really what Tim Berners-Lee had in mind! If that isnot the solution that the semantic web community wants, then theintegration issue needs to be solved sooner rather than later. Thelonger we wait to solve this, the harder it will get to solve it lateron, because the number of ontologies is increasing with time.

So in conclusion, I think that the most critical missing piece ofthe semantic web puzzle is a good tool — and a good methodology — formapping between ontologies. I just haven’t found one yet (but if youhave, feel free to suggest it to me!). The reason I think a mappingtool is a critical need is that I think while in theory it’s a niceidea to imagine ontologists reusing ontologies from one another, inpractice many ontologists (especially those working on large complexontologies) would rather write their own internally consistentontologies and map them to other ontologies rather than importing otherontologies into what they are making and then having to deal with allthe inconsistencies and confusion that arises from doing that. Inpractice, ontologists are usually people who value elegance andconsistency: A solution that runs contrary to those values won’t reallybe adopted by that community (of which I am a member).

The OWL language provides a means to express mappings betweenequivalent classes and equivalent properties for example, and thatmight be good enough. But I haven’t seen good support for actuallybuilding, and managing, such mappings within the ontology developmenttools I’ve looked at. And until this process of mapping betweenontologies is made far more productive and powerful, we will seeincreasing fragmentation instead of integration across the semanticweb. Similarly in tools like Protege, you can import other ontologies,but once you do so, very little support is provided for working withand modifying the new combined ontology.

The requirements for a good semantic integration tool are numerous.But chief among them is that such tools need to move beyond merelyhelping with integration between two ontologies — they need to help anontologist map their ontology to perhaps tens of other ontologies.There will also need to be specialized error checking capabilities andconsistency checkers — to look for logical problems and inheritanceincompatibilities that may arise in complex mappings, and to identifyclasses and properties that should be mapped but were missed. Perhapsby analysing instance data from different ontologies (such as differentontologies’ representations of the same unique entities or concepts)these tools could even learn or suggest mappings in order to assist orautomate the mapping process to some degree. I have seen papers onautomatic ontology mapping, but these capabilities haven’t made intothe ontology design tools. This needs to happen.

Until theprocess of integrating ontologies is less work than simply reinventingthe wheel, we are not going to see much semantic integration on thesemantic web. In short the vision of the semantic web as a decentralized fabric inwhich multiple ontologies interoperate, really hangs on a goodsolution to this issue.

I believe the semantic web is emerging and willcontinue to evolve even if semantic integration is not made easy — butin such a case, I think ultimately it will be dominated by a few largeontologies and service providers that everyone integrates with, ratherthan the original vision of a more decentralized system.

0 thoughts on “The Ontology Integration Problem

  1. I agree about the problem. But does one not have the same problem in Java? In java everyone can go and create their own classes. And that’s what most people do in fact do. Then when they find that there is really a large distributed need for the same functionality pressure is created towards integrating those classes into standardised and well established libraries. These then get to be widely used, and the cycle starts again.
    Integration on the Semantic Web should be a lot easier than with Java in some ways. But I can see the same thing happening. People open up their database and create their own ontologies. Then they find that a number of people share the same terms, so that they might as wells standardise on those, for legal and for business reasons (it’s difficult to maintain, there’s less trust, and the network effect). Hence the pressure will build towards standardised ontologies.
    This is not to say that good integration tools would not be useful. In fact it would be a very powerful tool, that would make things a lot easier. A little bit like refactoring IDEs in java.

  2. This is a thoughtful introduction to the nature of the beast. Let me say a few words about the path I am taking in this regard, a simple path: I am not in pursuit of any *integration* methodology. Rather, I am evolving methodologies for *federation*. Patrick Durusau and I gave a telecon lecture on the early version of federation [1] and I am now building the platform to do subject-centric federation. At SRI, we grafted a “delicious” workalike we call Tagomizer onto my subject map provider TopicSpaces. We did that to explore more learning opportunities for our project CALO.
    I realize it’s a kind of change of subject from “integration” to “federation.” There are, I think, two primary use cases for ontologies that direct how they are crafted and how they are used. One is the purely authoritative stance where questions to be answered must be judged, by some authority, to be correct. The other is not at all authoritative, and can be thought to be closer to the general “understanding some universe of discourse” needs of humanity. One would likely never want to integrate authoritative ontologies, except to the extent that some information will be lost when one “authority” contradicts another and the merging process is required to make a choice. But, it’s more then a good idea to federate disparate world views in order to more thoroughly present some universe of discourse. No information is lost. That’s the role of subject-centric federation.
    As a final comment, it’s bound to happen that some ontology classes imported into a subject map will find no “mates” with which to merge — nothing else in the map talking about the same subject. Those new “subjects” will not become islands in the map; they will always be linked to the subject that is their source, as will be each merged class within the subject proxy that is its new container in the map.

  3. This may be a common problem for all 4th generation language programming. When I write SQL or XSLT, I have less incentive to reuse, partially because of the power of these languages, partially because of the effort involving adapting the abstraction layer (e.g. table structure, xml schema).
    For ontologies, maybe we should start question whether mixing OO (Java) and functional rules is appropriate, even though most people say it’s a happy marriage.
    The question is with “enough” abstracted information, aspects, and ontologies, is it possible to blur the line between reality and virtuality.

  4. I have to agree with bblfish, in fact he took the words right off my tounge. Encouraging authors to create their own ontologies from scratch will be a disaster for the semantic web, despite the existence of “merging” tools and interfaces. And any effort to merge synonymous ontologies would be as wasteful as an effort to merge Java classes in different libraries which serve the same purpose. It may even be bit reckless to do so. Remember, we adopted the semantic web to finally get away from ambiguity. What you’re proposing will only encourage it. Ontology authors should instead be encouraged (through the availability of good search tools) to find, reuse, extend and re-publish! I understand your argument about the idiosyncrasies of different authors needs. Perhaps by establishing good conventions and best practices for developing “extension-friendly” ontologies, authors can be encouraged to develop their ontologies with the idea in mind they are to be used by others. This may mean refraining from using organization methodologies which might hinder other other’s efforts.
    I think semantic silos are created when ontologies are authored in vacums. The answer could be a wikitology. This allows the greatest denominator of methodologies to win out by democracy. As for those ontologists who can’t shoe-horn their needs into what’s availible in the wiki, here again, I believe they have no choice, because in the semantic web, if you’re not talking the same language as everyone else, then you simply won’t be heard (by sw agents, indexing tools, crawlers etc). With that aside though, I think such a wikitology could provide the “source of truth” that is lacking, and still accommodate the need for autonomy which you speak of, since everyone has a chance to design and influence the features of the ontology. There are public indecies of RDF ontologies (schemaweb, pingthesemanticweb), and there are even semantic wiki’s (ontoworld), but there has yet (to my knowledge) been anyone whose created a wiki that allows us to collaboratively develop ontologies. If anyone has, please post as I’d like to know how I can help.

  5. Great summary of the problem facing semantic web development. My personal experience leans toward putting emphasis first on building good ontology for your application rather than reusing existing ones. This is because, at this early stage, it’s rare to find classes or properties in existing ontologies that can meet all the specific requirements for the new ontology being developed.
    For example, I tried to reuse as much as I can for the scientific publishing ontology being developed under W3C task force I’m coordinating. The current version has properties taken from DC and FOAF, but I always feel it’s not right. Often times, the terms look right, but the ontological definition is off. I may have to throw most of them out in the next revision.
    Some useful applications do not necessarily rely on integration of data represented in many different ontologies. Semantic publishing is an example, which I’m experimenting with now. I think compromising the integrity of the ontology itself for the sake of ontology reuse or future integration does not serve the purpose well.

  6. Hi everybody,
    TermExtractor, my master thesis, is online at the
    TermExtractor is a software package for automatic extraction of terminology consensually
    referred in a specific application domain. The package
    takes as input a corpus of domain documents, parses
    the documents, and extracts a list of “syntactically
    plausible” terms (e.g. compounds, adjective-nouns,
    etc.). Documents parsing assigns a greater importance
    to terms with text layouts (title, bold, italic,
    underlined, etc.). Two entropy-based measures, called
    Domain Relevance and Domain Consensus, are then used.
    Domain Consensus is used to select only the terms
    which are consensually referred throughout the corpus
    documents. Domain Relevance to select only the terms
    which are relevant to the domain of interest, Domain
    Relevance is computed with reference to a set of
    contrastive terminologies from different domains.
    Finally, extracted terms are further filtered using
    Lexical Cohesion, that measures the degree of
    association of all the words in a terminological
    string. Accept files formats are: txt, pdf, ps, dvi,
    tex, doc, rtf, ppt, xls, xml, html/htm, chm, wpd and
    also zip archives.

  7. As long as works are concentrated mostly on the semantic representation of knowledge using ontology, the vision of the semantic web will not be fully achieved. The real problem relies on interoperability between ontologies. If mapping has failled to provide a viable solution(since no one exist), the solution may rely in adding one extra layer on the semantic web called interoperabiliy layer. This layer will will provide the principles, theories and interactive agent of interoperability. The layer is expected to interact with the ontology layer of the semantic web above it and also with interoperability layer of others document seen as layers beneath it(although it is a different document )