The Ontology Integration Problem

The OWL language, and tools such as Protege and TopBraid Composer make it easy to design ontologies. But what about the problem of integrating disparate ontologies? I haven’t really found a good solution for this yet.

In my own experience designing a number of OWL ontologies (500 classes – 3000 classes on average) it has often been easier to create my own custom ontology branches to cover various concepts than to try to integrate other ontologies of those concepts into my own.

One of the reasons for this is that each ontology has it’s own naming conventions, philosophical orientation, domain nuances, design biases and tradeoffs, often guided by particular people and needs that drove their creation. Integrating across these different worldviews and underlying constraints is often hard. Simply stating that various classes or properties are equivalent is not necessarily a solution because their inheritance may not in fact be equivalent and thus they may actually be semantically quite different in function, regardless of expressions of equivalence. OWL probably needs to be a lot more expressive in defining mappings between ontologies to truly resolve such subtle problems.

The alternative to mapping — importing external ontologies into your own — is also not great because it usually results in redundancies, as well as inconsistent naming conventions and points of view. As you keep adding colors to your palate, it starts to become kind of brown. If the goal is to make ontologies that are elegant, easy to maintain, extend, understand and apply, importing ontologies into other ontologies doesn’t seem to be the way to accomplish that. Different ontologies usually don’t  fit together well, or even at all in some cases.

Because of the above problems, it is often easier to simply reinvent the wheel. Instead of trying to map between ontologies, or import other ontologies into one’s own ontology, it is usually easier to just write everything oneself. Of course this is not really as efficient as we might like — it would certainly be great to be able to easily reuse other people’s ontologies in one’s own ontologies. But in practice that is still very very hard.

And this is a problem, isn’t it? Because if the semantic web is really going to take off we have to either find easy and effective ways to connect ontologies together, or we have to get everyone to use the same ontology. Nobody has yet solved the problems of mapping and importing ontologies well enough. And likewise so far nobody has succeeded in making the uber-ontology and convincing everyone else to use it. In fact, I think it’s probably safe to say that the more comprehensive and powerful an ontology is, the fewer the number of people who will agree on it, let alone understand it well enough to use it.

The dream of the Semantic Web vision is that someday there will be thousands or millions of ontologies around the web, and millions of instances of them. And these will all somehow be integrated automagically, or at least if they aren’t integrated on the semantic level, then there will be magic software that embodies that integration. In any case, the hope is that someday intelligent agents will be able to freely and seamlessly roam around harvesting this data,squishing it together into knowledge bases, and reasoning across them.But neither harvesting, nor squishing, nor reasoning can really take place without some level of semantic integration of the underlying ontologies. Yet, how will all these disparate ontologies be connected?Unless mappings are created between them, instead of a Semantic Web,we’ll just have millions of little semantic silos. Maybe some company will succeed in making the biggest silo and that will be “the” semanticweb to most people. That might be the best solution in fact, but I’m not sure that is really what Tim Berners-Lee had in mind! If that is not the solution that the semantic web community wants, then the integration issue needs to be solved sooner rather than later. The longer we wait to solve this, the harder it will get to solve it later on, because the number of ontologies is increasing with time.

So in conclusion, I think that the most critical missing piece of the semantic web puzzle is a good tool — and a good methodology — for mapping between ontologies. I just haven’t found one yet (but if you have, feel free to suggest it to me!). The reason I think a mappingtool is a critical need is that I think while in theory it’s a nice idea to imagine ontologists reusing ontologies from one another, in practice many ontologists (especially those working on large complex ontologies) would rather write their own internally consistent ontologies and map them to other ontologies rather than importing other ontologies into what they are making and then having to deal with all the inconsistencies and confusion that arises from doing that. Inpractice, ontologists are usually people who value elegance and consistency: A solution that runs contrary to those values won’t really be adopted by that community (of which I am a member).

The OWL language provides a means to express mappings between equivalent classes and equivalent properties for example, and that might be good enough. But I haven’t seen good support for actually building, and managing, such mappings within the ontology development tools I’ve looked at. And until this process of mapping between ontologies is made far more productive and powerful, we will see increasing fragmentation instead of integration across the semanticweb. Similarly in tools like Protege, you can import other ontologies,but once you do so, very little support is provided for working with and modifying the new combined ontology.

The requirements for a good semantic integration tool are numerous.But chief among them is that such tools need to move beyond merely helping with integration between two ontologies — they need to help an ontologist map their ontology to perhaps tens of other ontologies.There will also need to be specialized error checking capabilities and consistency checkers — to look for logical problems and inheritance incompatibilities that may arise in complex mappings, and to identify classes and properties that should be mapped but were missed. Perhaps by analysing instance data from different ontologies (such as different ontologies’ representations of the same unique entities or concepts)these tools could even learn or suggest mappings in order to assist or automate the mapping process to some degree. I have seen papers on automatic ontology mapping, but these capabilities haven’t made into the ontology design tools. This needs to happen.

Until the process of integrating ontologies is less work than simply reinventing the wheel, we are not going to see much semantic integration on the semantic web. In short the vision of the semantic web as a decentralized fabric in which multiple ontologies interoperate, really hangs on a good solution to this issue.

I believe the semantic web is emerging and will continue to evolve even if semantic integration is not made easy — but in such a case, I think ultimately it will be dominated by a few large ontologies and service providers that everyone integrates with, rather than the original vision of a more decentralized system.