The Ontology Problem: A Definition with Commentary

The Ontology Problem is a fundamental challenge of the emerging Semantic Web. This problem is comprised of three key sub-problems, the Upper Ontology Problem, the Domain Ontology Problem, and the Ontology Integration Problem, described in detail below:

 

1. The Upper Ontology Problem

When representing the world with ontologies, we need certain basic
“building block” concepts before we can ontologically define
higher-level concepts. For example, before we can really define what we
mean by a “geographic region” we first need basic definitions of
building block concepts such as “planet,” “geographic location,” “set,”
“boundary,” “container” and “content of,” and perhaps “elevation,”
“longitude,” “latitude” and so on. Until we have defined these building
blocks we cannot build a semantic definition of what a geographic
region really is. It turns out that at least when describing our
consensus reality, there is a relatively small set of building block
concepts that are needed by most ontologies. We can call a set of
building-block concepts an “Upper Ontology.”

Upper Ontologies are harder to design than domain ontologies in a
certain respect — they are generally both more granular and more
macroscopic, and generally the concepts they define are more abstract
and often epistemological in nature. While someone may be a domain
expert in their own field and be able to design a fairly decent
ontology about their domain, designing a truly suitable Upper Ontology
is a different specialization altogether. This distinction is similar
to the difference between a programmer who designs compilers and
development environments, and a programmer who writes software on such compilers and IDE’s. These two types of programmers generally have
different categories of skills and knowledge. Similarly, Upper
Ontologies and the skills needed to design them are quite different
from Mid-Level or Lower domain ontologies and the skillsets they
require.

The Upper Ontology Problem is simply that there is no generally-accepted, comprehensive, standardized Upper Ontology in use today. When developing a domain ontology, developers must therefore
either:

(a) Develop their own Upper Ontology first (a big task that they shouldn’t have to undertake, and probably don’t have time to complete),
(b) Use one of the various existing Upper Ontologies such as SUO/SUMO, OpenCyc, or other proposed Upper Ontologies (a choice which is difficult to make for a non-specialist
developer because they may not even know how to assess the relative value of these different ontologies, and/or they may not have enough knowledge about the respective languages in which various ontologies are expressed to really understand them without extensive study first, and worse, by choosing one such Upper Ontology all of their own ontology’s next-level concepts will automatically become “upper-ontology-dependent” and not necessarily compatible with the
other Upper Ontologies they did not choose),
(c) Or, finally, they can decide to just not use an Upper Ontology (the choice made in ontologies such as FOAF; a choice which makes things simpler for the moment, but which also results in an “ontological light-cone” or “ontology horizon” of sorts beyond which the concepts in the ontology become ambiguous and essentially undefined.) None of these choices are easy to make, nor optimal.

Domain-ontology developers should not have to worry about also
developing their own Upper Ontologies. Instead, either there should be
one truly good standard Upper Ontology, or there should at least exist
a meta-ontology that maps all the concepts in the most common Upper
Ontologies to one another so that it doesn’t matter which one is used.
But this hasn’t happened yet.

Note: A terrific Upper Ontology that I highly recommend (disclaimer: I helped develop major parts of it), is the University of Texas Clib ontology, which, by the way, is open-source (it says GPL but will actually be LGPL soon). You can view all the current builds here. In particular, I would suggest looking at the OWL version, which is a scaled-down subset of the full ontology (which is in KM, a more expressive axiomatic language). I have contributed a large number of classes and relations to this version of the CLIB so feel free to ask me questions if you would like to discuss this further. Please note that this ontology is still evolving, so if you build on it, you might want to let us know and keep up with changes by checking the builds frequently.

 

The Domain Ontology Problem

A few useful general-purpose mid-level and lower-level (“domain
level”) ontologies exist. For example, FOAF is an ontology about people
and relationships, DOAP is a proposed ontology about projects, the
Dublin Core is an ontology of the most basic properties of library
resources, etc. There even highly detailed ontologies developed to
describe various medical domains, commerce domains and military
domains. However, it is safe to say that the vast majority of vertical
subject domains have yet to be modeled ontologically, let alone
released in an open manner.

There are simply so many knowledge niches in the world — even huge
ontologies containing tens of thousands of class definitions, such as
OpenCyc, are still relatively limited in their conceptual breadth,
depth and resolution. In order for all types of information and
knowledge to be expressible and accessible in the Semantic Web,
ontologies for all these specialized domains need to be developed and
made public ally available in some manner. Furthermore, they need to
somehow connect together via a solution to the above Upper Ontology
Problem so that they can be normalized and mapped to one another
easily. Until that happens the Semantic Web will still be incredibly
useful, but only for representing and accessing general knowledge or
working with domain-specific concepts that are defined by the small set
of currently existing domain ontologies.

The solution: More ontologies need to be created about new, vertical
domains, and mapped to common open Upper Ontologies. Easier said than
done! Before domain-ontologies will be created someone has to come up
with a compelling benefit for doing so — for example, applications or
services that make use of these domain-ontologies to solve problems
that real people actually have and need solutions to.

 

The Ontology Integration Problem 

As alluded to above, it is one thing to develop an ontology but
quite another to make it compatible with other existing ontologies.
This is the Ontology Integration Problem. This problem turns out to be
far more subtle than most people who currently write about the Semantic
Web have noted as of yet. Integrating ontologies is not as simple as
just mapping classes in one ontology to corresponding classes in
another ontology. Because it turns out that it is not merely the names
and properties of classes that are significant to defining their
meanings and mappings, but also their inheritance paths in their
respective ontologies. For example, consider these two ontology class
outlines:

    • Ontology A
      • Thing
        • Legal Person
          • Human
          • Corporation
        • Living Thing
          • Person
        • Organization
          • Corporation
        • Professional Occupation
          • Lawyer
    • Ontology B
      • Thing
        • Living Thing
          • Person
            • Legal Person
              • Lawyer
        • Non-Living Thing
          • Organization
            • Legal Organization
              • Corporation

If we mapped these ontologies to one another simply by virtue of
mapping “Person” in Ontology A to the class “Person” in Ontology B, we
would wrongly be in plying that Ontology A’s concept, “Person” is
equivalent to the Ontology B’s concept, “Person.” However there is big
difference in actual meaning between what these two ontologies mean by
“Person.” This difference comes from semantics implied by ontology
class inheritance differences between the two classes for “Person” in
these two ontologies. Ontology A uses “Person” to mean a human with
legal status in some legal system. Ontology B says that a Person is
simply some type of “Living Thing” but not necessarily a legal entity.
In other words, in Ontology A, “All Persons are Legal Entities” while
in Ontology B, “Some Persons may be Legal Entities,” while others may
not be. Similarly, consider how to map between the concepts of “Legal Person” in the two ontologies, an even more hairy problem.

The difficulty in integrating these two ontologies is in figuring
out how to express the similarity and difference in meaning between
these two concepts of “Person.” One answer is to create a new third
ontology that attempts to unify the concepts in both Ontology A and
Ontology B, which can then be used to map between them — this is a
kind of “semantic middleware” approach; it’s weakness is that it only
applies to the mapping between these two ontologies and cannot be
extended easily to map to additional ontologies, or for different
subject domains.

Another approach might be to instead develop a general semantics for
expressing inter-ontology mapping concepts — and then use this
meta-ontology to create instances that express mappings between classes
and properties in various ontologies. This approach is of particular
long-term value, however it is not simple to accomplish — gradations
in semantic intent are notoriously subtle and complex to codify, and to
my knowledge nobody has developed an ontology which attempts to
formalize them in an open, ontology-independent manner (although prior
work in OIL was in that direction).

Having worked on some very large ontology integration problems
(integrating two partially overlapping ontologies, each with several
thousand classes and properties defined, and expressed in different
ontology markup languages with different expressive power, for
example), I can tell you that the difficulty of such integration
increases exponentially to the number of concepts being integrated.

Because mapping between different ontologies is quite difficult —
even more difficult than designing new ontologies from scratch, most
ontology developers take the latter approach. Thus we have few mappings
between existing ontologies, and an increasing number of small,
non-integrated ontologies about different domains. While it is easy to
state that these various ontologies can be integrated such that they
eventually all connect together, the task of actual doing such
integration is difficult in practice. If the Semantic Web is really
going to one day “link together islands of meaning” in different places,
we must solve the Ontology Integration Problem.

The alternatives are unacceptable. If we don’t solve it, we will
either end up with: (a) lots of totally incompatible ontologies and
knowledge based on them (just more “data silos” which is precisely what
the Semantic Web was supposed to eliminate!), (b) an incomplete set of
partially-incorrect mappings between ontologies (because nobody has
time to map each ontology to every other ontology, and furthermore,
even if they tried, without adequate mapping semantics, such mappings
will contain partial-truths or even glaring errors and contradictions).

If the Ontology Integration Problem is not solved it will not be
possible to answer a semantic search query across the open Web for a
question such as “find all software products that work with Linux and
are open-source and are endorsed by people or companies I trust.” Why
not? Because while there could be tons of raw RDF and OWL instance data
out on the Web that is relevant from various ontologies, unless it
either all uses the same ontology or all the ontologies that various
instances refer to are integrated, the query agent will have no way of
making sense of or normalizing the results. Of course, the query agent
could simply run the query on all data from all ontologies it knows
about, and then just present the results in a single list, sorted by
ontology — but as we’ve seen above, different ontologies might mean
different things by classes with the same names — and thus the results
returned may not really be relevant or well-ordered.

Another solution that has been proposed is to automate this process
by perhaps using learning and logic agents to analyze ontological
structures and/or the data-sets corresponding to various ontologies, in
order to automatically learn or derive rules and mappings that
integrate them. I personally doubt that the automated ontology mapping
approach will yield useful fruit anytime soon — there is still no
substitute for human domain-expertise in mapping between ontologies. It
simply requires too subtle an epistemological and semantic intelligence
for an automated program to do well.

I believe the solution will ultimately stem from a solution to the
Upper Ontology Problem — if we can solve that problem, then much of
the Ontology Integration Problem will go away as most ontologies will
automatically be inter-mapped at the Upper-Ontology Level at least. If
we had a standard Upper Ontology and furthermore, if this standard were
also to include meta-level concepts for mapping between ontologies and expressing differences in meaning between sets of classes in different ontologies, then integrating ontolgies would certainly be easier.

Note: See Also, this related article, on how to design richer semantics using Roles as a design pattern in ontologies.

Social tagging: > > > > > > >

5 Responses to The Ontology Problem: A Definition with Commentary

  1. Pat says:

    Brilliant summary of ontology problems. One question. Is there a significant difference between taxonomy and ontology?
    (In the past when thinking about the problems you outline I have always used the word taxonomy in my mind, but that’s probably because I was unaware that that ontology was the technical word)

  2. Nova says:

    Yes there is a difference between a taxonomy and an ontology. An ontology defines not only the classes (the things) but also their properties, such as their attributes and the types of relationships they can have to one another, as well as possible restrictions on how the classes and properties can be instantiated. Consider this example:
    A TAXONOMY:
    1. Thing
    1.a. Living Thing
    1.a.i. Human
    1.a.ii. Animal
    AN ONTOLOGY:
    (Class Thing
    hasName (string)
    )
    (Class LivingThing
    is-a Thing
    hasName (string)
    hasBirthDate (datetime)
    hasLocation (GeographicLocation)
    hasWeight (long)
    )
    (Class Human
    is-a LivingThing
    is-a Thing
    hasName (string)
    hasBirthDate (datetime)
    hasLocation (GeographicLocation)
    hasWeight (long)
    hasFirstName (string)
    hasLastName (string)
    hasCityOfResidence (City)
    hasNationality (Nation)
    hasGender (“male” or “female”)
    hasHeight (long)
    hasEthnicity (Ethinicity)
    hasFriend (Role: Friend)
    hasEmploymentHistory (EmploymentHistory)
    …..
    )

  3. pat says:

    thank you for the clarification. interesting. i knew there was a reason I subscribed to your feed. =)

  4. Great breakdown of ontology issues…
    But if one of the goals of the Semantic Web is to demolish the concept of “data silos” as well as the rigorous “structured data” concept- doesn’t using such a complex ontology put us back to square one anyway? I feel like we’re moving away from the idea of having the DATA carry organizational information.
    Our brains don’t work this way- we don’t have an incredibly complex “card catalog” of semantics in our heads.
    If you did implement a “perfect” Upper Ontology, as well as a standard functional Ontology – how do all of the “slots” get filled with data about data?
    And won’t that generate “meta-data” that is as expansive as the current Web itself?

  5. Adam Pease says:

    Good article! I’d like to mention the Suggested Upper Merged Ontology http://www.ontologyportal.org. Unlike the other available upper ontologies, SUMO has a wealth of domain ontologies that should make it clearer for a given user how he or she might get started using some relevant concepts. Another unique feature is a set of mappings to all of WordNet, which allows the new users to enter most any English word and find out which SUMO terms are relevant.