The Ontology Integration Problem
August 31st, 2006The OWL language, and tools such as Protege and TopBraid Composer make it easy to design ontologies. But what about the problem of integrating disparate ontologies? I haven’t really found a good solution for this yet.
In my own experience designing a number of OWL ontologies (500 classes – 3000 classes on average) it has often been easier to create my own custom ontology branches to cover various concepts than to try to integrate other ontologies of those concepts into my own.
One of the reasons for this is that each ontology has it’s own naming conventions, philosophical orientation, domain nuances, design biases and tradeoffs, often guided by particular people and needs that drove their creation. Integrating across these different worldviews and underlying constraints is often hard. Simply stating that various classes or properties are equivalent is not necessarily a solution because thier inheritance may not in fact be equivalent and thus they may actually be semantically quite different in function, regardless of expressions of equivalence. OWL probably needs to be a lot more expressive in defining mappings between ontologies to truly resolve such subtle problems.
The alternative to mapping — importing external ontologies into your own — is also not great because it usually results in redundancies, as well as inconsistent naming conventions and points of view. As you keep adding colors to your pallete, it starts to become kind of brown. If the goal is to make ontologies that are elegant, easy to maintain, extend, understand and apply, importing ontologies into other ontologies doesn’t seem to be the way to accomplish that. Different ontologies usually don’t fit together well, or even at all in some cases.
Because of the above problems, it is often easier to simply reinventthe wheel. Instead of trying to map between ontologies, or import otherontologies into one’s own ontology, it is usually easier to just writeeverything oneself. Of course this is not really as efficient as wemight like — it would certainly be great to be able to easily reuseother people’s ontologies in one’s own ontologies. But in practice thatis still very very hard.
And this is a problem, isn’t it? Because if the semantic web isreally going to take off we have to either find easy and effective waysto connect ontologies together, or we have to get everyone to use thesame ontology. Nobody has yet solved the problems of mapping andimporting ontologies well enough. And likewise so far nobody hassucceeded in making the uber-ontology and convincing everyone else touse it. In fact, I think it’s probably safe to say that the morecomprehensive and powerful an ontology is, the fewer the number ofpeople who will agree on it, let alone understand it well enough to useit.
The dream of the Semantic Web vision is that someday there will bethousands or millions of ontologies around the web, and millions ofinstances of them. And these will all somehow be integratedautomagically, or at least if they aren’t integrated on the semanticlevel, then there will be magic software that embodies thatintegration. In any case, the hope is that someday intelligent agentswill be able to freely and seamlessly roam around harvesting this data,squishing it together into knowledgebases, and reasoning across them.But neither harvesting, nor squishing, nor reasoning can really takeplace without some level of semantic integration of the underlyingontologies. Yet, how will all these disparate ontologies be connected?Unless mappings are created between them, instead of a Semantic Web,we’ll just have millions of little semantic silos. Maybe some companywill succed in making the biggest silo and that will be “the” semanticweb to most people. That might be the best solution in fact, but I’mnot sure that is really what Tim Berners-Lee had in mind! If that isnot the solution that the semantic web community wants, then theintegration issue needs to be solved sooner rather than later. Thelonger we wait to solve this, the harder it will get to solve it lateron, because the number of ontologies is increasing with time.
So in conclusion, I think that the most critical missing piece ofthe semantic web puzzle is a good tool — and a good methodology — formapping between ontologies. I just haven’t found one yet (but if youhave, feel free to suggest it to me!). The reason I think a mappingtool is a critical need is that I think while in theory it’s a niceidea to imagine ontologists reusing ontologies from one another, inpractice many ontologists (especially those working on large complexontologies) would rather write their own internally consistentontologies and map them to other ontologies rather than importing otherontologies into what they are making and then having to deal with allthe inconsistencies and confusion that arises from doing that. Inpractice, ontologists are usually people who value elegance andconsistency: A solution that runs contrary to those values won’t reallybe adopted by that community (of which I am a member).
The OWL language provides a means to express mappings betweenequivalent classes and equivalent properties for example, and thatmight be good enough. But I haven’t seen good support for actuallybuilding, and managing, such mappings within the ontology developmenttools I’ve looked at. And until this process of mapping betweenontologies is made far more productive and powerful, we will seeincreasing fragmentation instead of integration across the semanticweb. Similarly in tools like Protege, you can import other ontologies,but once you do so, very little support is provided for working withand modifying the new combined ontology.
The requirements for a good semantic integration tool are numerous.But chief among them is that such tools need to move beyond merelyhelping with integration between two ontologies — they need to help anontologist map their ontology to perhaps tens of other ontologies.There will also need to be specialized error checking capabilities andconsistency checkers — to look for logical problems and inheritanceincompatibilities that may arise in complex mappings, and to identifyclasses and properties that should be mapped but were missed. Perhapsby analysing instance data from different ontologies (such as differentontologies’ representations of the same unique entities or concepts)these tools could even learn or suggest mappings in order to assist orautomate the mapping process to some degree. I have seen papers onautomatic ontology mapping, but these capabilities haven’t made intothe ontology design tools. This needs to happen.
Until theprocess of integrating ontologies is less work than simply reinventingthe wheel, we are not going to see much semantic integration on thesemantic web. In short the vision of the semantic web as a decentralized fabric inwhich multiple ontologies interoperate, really hangs on a goodsolution to this issue.
I believe the semantic web is emerging and willcontinue to evolve even if semantic integration is not made easy — butin such a case, I think ultimately it will be dominated by a few largeontologies and service providers that everyone integrates with, ratherthan the original vision of a more decentralized system.