It’s been a while since I posted about what my stealth venture, Radar Networks, is working on. Lately I’ve been seeing growing buzz in the industry around the "semantics" meme — for example at the recent DEMO conference, several companies used the word "semantics" in their pitches. And of course there have been some fundings in this area in the last year, including Radar Networks and other companies.
Clearly the "semantic" sector is starting to heat up. As a result, I’ve been getting a lot of questions from reporters and VC’s about how what we are doing compares to other companies such as for example, Powerset, Textdigger, and Metaweb. There was even a rumor that we had already closed our series B round! (That rumor is not true; in fact the round hasn’t started yet, although I am getting very strong VC interest and we will start the round pretty soon).
In light of all this I thought it might be helpful to clarify what we are doing, how we understand what other leading players in this space are doing, and how we look at this sector.
Indexing the Decades of the Web
First of all, before we get started, there is one thing to clear up. The Semantic Web is part of what is being called "Web 3.0" by some, but it is in my opinion really just one of several converging technologies and trends that will define this coming era of the Web. I’ve written here about a proposed definition of Web 3.0, in more detail.
For those of you who don’t like terms like Web 2.0, and Web 3.0, I also want to mention that I agree — we all want to avoid a rapid series of such labels or an arms-race of companies claiming to be > x.0. So I have a practical proposal: Let’s use these terms to index decades since the Web began. This is objective — we can all agree on when decades begin and end, and if we look at history each decade is characterized by various trends.
I think this is reasonable proposal and actually useful (and also avoids endless new x.0′s being announced every year). Web 1.0 was therefore the first decade of the Web: 1990 – 2000. Web 2.0 is the second decade, 2000 – 2010. Web 3.0 is the coming third decade, 2010 – 2020 and so on. Each of these decades is (or will be) characterized by particular technology movements, themes and trends, and these indices, 1.0, 2.0, etc. are just a convenient way of referencing them. This is a useful way to discuss history, and it’s not without precedent. For example, various dynasties and historical periods are also given names and this provides shorthand way of referring to those periods and their unique flavors. To see my timeline of these decades, click here.
So with that said, what is Radar Networks actually working on? First of all, Radar Networks is still in stealth, although we are
planning to go beta in 2007. Until we get closer to launch what I can
say without an NDA is still limited. But at least I can give some
helpful hints for those who are interested. This article provides some hints, as well as what I hope is a helpful tutorial about natural language search and the Semantic Web, and how they differ. I’ll also discuss how Radar Networks compares some of the key startup ventures working with semantics in various ways today (there are many other companies in this sector — if you know of any interesting ones, please let me know in the comments; I’m starting to compile a list).
(click the link below to keep reading the rest of this article…)
Semantic Social Software: The Semantic Web for Consumers
Here at Radar Networks, we are building a next-generation Web-based
online service that will bring the Semantic Web to consumers and
professionals across the Web. This application is focused on enabling
the next generation of social software (note that
social software is not necessarily social networking — that is subset
of social software). It is an example of what "the
Intelligent Web" will be like. We are very excited about this service
and what it already does, but there’s still more to do before we
Our app is based on the Semantic Web. It will
enrich and facilitate more intelligent online relationships, community,
content, collaboration and even commerce. It will help to bring the
Semantic Web from research to reality by making it user-friendly,
accessible and most of all, directly useful and valuable, to ordinary
people. We are focused on providing value to consumers — not just developers or early-adopters. But like I said, I can’t really provide more details until we
get closer to launch.
Our Web 3.0 Applications Platform
In order to build our product we had to first build a new platform
to support the kinds of features and capabilities we designed — we
could not find any existing platform that could do what we wanted to
do. Existing platforms for the Semantic Web were too research-oriented
and did not provide the levels of scalability, performance and
ease-of-use that we required.
We have been working on this platform
over several years and several generations of our codebase. It is now
very robust and sophisticated. We believe it is also significantly more
scalable and performant than any platform we’ve seen in the Semantic
Web space to-date.
Our platform is a comprehensive, Java-based framework for semantic
web applications and services that has some similarities to Ruby on
Rails (although it is also very different from RoR and we are not going after the platform market — we’re really more focused on our application right now). Our platform also includes a lot of other technology such as our
extremely fast and scaleable storage layer for semantic data tuples,
powerful semantic query capabilities, and a range of algorithms for
analyzing data and doing intelligent things for users.
could be called a "Web 3.0" applications platform because it is
inherently based around RDF/OWL and the emerging Semantic Web. In
addition to the "Web 3.0" aspects of what we are doing, our platform
also makes heavy use of "Web 2.0" methods and technologies such as
AJAX, REST, widgets, and RSS/ATOM, to name a few.
What We are Not Doing: Natural Language Search
First of all, we at Radar Networks are NOT building a new search engine to compete with Google, like Powerset and TextDigger are doing — so we’re not competing with them. Companies like Powerset and TextDigger are working on natural language search. Natural language search is not equivalent to the Semantic Web, although the Semantic Web can certainly help that process.
Companies working specifically on natural language search are making
use of semantics, but at the word-level only. They use networks of words that are linked to synonyms, antonyms, homonyms and other variations. These are sometimes called semantic networks. Based on these networks of word meanings, they can understand the meaning of various words and expressions.
More sophisticated natural language search algorithms don’t just look at the words alone, they look at them in context, by analyzing the grammar and the rest of the content around them. The point of natural language search is ultimately to try to match the meaning of words in search queries to the content of various documents — and to do this better than Google, which basically just matches keywords without paying attention to the meaning of the words.
Essentially natural language search requires at least some level of artificial intelligence. Machine
understanding of natural language is a difficult problem and there has
been a lot of work on this over the last few decades. Today there are
many technologies that focus on this but the majority of them are based
on the assumption that software should do all the work to figure out
the meaning of information.
What We Are Doing: Semantic Web
In contrast to natural language search which focuses on trying to derive the meanings of words, the approach of the emerging Semantic Web makes use of metadata to encode the meaning of information.
In this approach, the meaning of the information can be explicitly
coded into the information just as HTML codes are added into content today — and this can be done by people or software, and even by communities. Once this meaning — or semantics — is explicitly encoded into content, it can then be re-used by other applications to make sense of the content. It’s worth noting that explicit semantics in content can also help natural language processing apps, as well as apps that don’t understand natural language.
In the Semantic Web approach, the meaning of the information is encoded using markup
languages such as RDF and OWL, which are W3C open standards. Words and concepts in the content of documents and data records can be marked up with RDF/OWL expressions to indicate what they mean — does a certain word or phrase such as "Lotus" for example, mean a software company, a software product, an exotic sportscar brand, or some other kind of concept? Without sophisticated natural language processing it is often difficult for software to determine this on its own. The Semantic Web provides markup codes that explicitly indicate the intended meaning of information in an unambiguous, machine-readable format.
Marking up content with additional metadata was possible before the Semantic Web using XML: you could just say <sportscar>Lotus</sportscar> but the problem is that the meaning of "sportscar" still had to be coded into applications in order for them know what it implies. With RDF/OWL that meaning can be formally encoded outside of applications in a set of definitions called an ontology. An ontology defines facts such as "a sportscar is a kind of car," "a car is a ground vehicle," "a car is a product," "a car is a device," "a sportscar is a recreational or competitive vehicle," etc.
By marking up content with OWL indicating that it is a sportscar, that meaning refers to the appropriate definitions in an ontology, from which any application that can read the ontology can then then infer these various specific intended meanings. The point here is that semantics are less ambiguous — they are explicitly encoded by the ontology which functions as a kind of more advanced data schema of sorts.
But this is really an oversimplification — OWL and ontologies can actually go a lot further than just defining the meaning of concepts — they can also define their logical relations as well. For example, how exactly are two things connected and are there any special restrictions on that connection? For instance, an ontology can define that a person’s sister must be female, or that a person can only have 1 biological mother, etc.
All kinds of apps can benefit from the extra hints about the meaning of
the information that can be provided by Semantic Web metadata around
content. For example, even a natural language search engine could do
less analysis and would need less intelligence, if it could leverage
existing semantic metadata that was already in content.
It’s important to note that applications and people don’t have to necessarily ever look at RDF or
OWL code (thank heavens!) — they can just work with objects and forms
like they already do on the Web and the underlying markup can be
created automatically for them. Nobody should have to look at raw RDF and OWL (unless they really want to), and the Semantic Web doesn’t force anyone to. For example, most of us don’t write HTML or XML or CSS by hand — but if we are using blogs or wikis or even posting listing on sites like job boards and auctions, we we are doing things that result in HTML, XML and CSS being created.
It should be clear from the above section that natural language search is a specific process that makes use of word-level semantics, but the Semantic Web is a broad set of technologies for defining the meaning of any kind of information (including, but not limited to words). The Semantic Web can help improve the process of natural language search, but today many natural language search algorithms do not make use of the Semantic Web or RDF/OWL data structures. However, as these technologies begin to converge (as they are here at Radar Networks, in fact) we will see new levels of accuracy become possible — the combination of traditional natural language processing and the richer semantics of RDF/OWL markup enables even more powerful machine-understanding and processing of text. That said, once again, I want to be clear that Radar Networks is not a search company — although we do use next-generation semantic search quite extensively in our application and platform.
Any application that can understand RDF/OWL can correctly interpret
the meaning of any content that is marked up with RDF/OWL metadata. If
a news article that mentions "Paris" many times is marked up with
RDF/OWL metadata then any app that can understand that metadata can for
example, correctly determine that the article is about the place Paris,
Texas, not the place Paris, France, and not the person Paris Hilton
either. The application doesn’t have to do any fancy natural language
processing to know this. Even a relatively "dumb" application that has
no ability to do natural language processing can still make sense of a
document if it can at least understand RDF/OWL.
So how does this explicit semantic markup in the form of
RDF/OWL metadata get into the document in the first place? Well it could have been added
automatically by some other software app that did natural language
processing on it, or it could have resulted from newspaper editors and/or
even readers categorizing and/or tagging the document with tags for places, people, etc. in
a manner not unlike how they tag content in services like Flickr today.
The main point here is that adding the semantic metadata does not require
the apps that create or consume consume the content to understand natural language, nor does it require people to be XML coders — even regular end-users can help to define
the semantics of content by simply tagging it. The Semantic Web provides a much
richer and more expressive framework for doing this than is currently available in Web 2.0 "tags," but it’s not that far off either.
The Semantic Web can enhance word-level understanding and processing
of text in many ways, but note that it is not limited only to
word-level applications. The Semantic Web provides a way to make any
information more understandable to other applications — including data
records in databases, documents on the desktop and the Web, enterprise
data, photos, videos, music, and even Web services and software code.
Simple Examples of Semantics
For example, today there is a big problem in integrating data across
applications. In the enterprise for example, one application might define a record called a "Customer"
while another might call that concept by the term "Client." If a user
then searches for "Customers" they won’t necessarily also find records
for Clients. But using the Semantic Web the data records for Customers
and Clients can be mapped together so that applications can treat them
as equivalent. Any search for one will return the other as well. Not only can records be mapped to each other, but also the fields of those records can be
mapped together. For instance, the Customer record might have a field
named "Referred by" while the Client record might have a field called
"Introduced by" — these can be mapped together as well.
A similar example could apply to a consumer use-case — for example shopping: different stores describe the same product differently — with different terms. In one store a laptop is called a "laptop computer" and in another it is called a "portable computer," while another calls it a "desktop replacement." A search for any of these terms should return products that use any of these. Within a single commerce site this is not so hard, but what about searching across many commerce sites (which isn’t really even that easy to do at all today…)? If different commerce sites used the same underlying semantic metadata definitions to markup their various products, then users could search across their products with less trial-and-error, and they would get better results.
Of course the
technology for mapping between databases is not new — there are many
ways to do this — but the Semantic Web provides a way to do it that
may be more open and efficient in the long-run. Central to this approach is that an organization or
online service can use ontologies that centrally define key
concepts in a rigorous way. So instead of every different app and data
record having to be individually mapped to every other, they can
potentially all just map to the central ontology which functions as a
kind of semantic switchboard of sorts. All applications and queries can
use a common ontology (or set of them) to unify access to data records across many
different online services and databases. In a sense ontologies provide a way to define and share common languages for data, content, relationships and applications.
SPARQL and the Emerging Data-Web
More recently a new Semantic Web technology called SPARQL has also started to
emerge. SPARQL provides a common query language, like SQL, for querying
data that is stored in RDF. Any site or database that has RDF data and
that provides a SPARQL interface can be searched by any application
that speaks SPARQL. This means that the dream of "deep web search" is
finally going to become a reality. There is a huge amount of interest
in SPARQL at the moment and there are already a growing number of
SPARQL endpoints popping up around the Web. These new SPARQL endpoints
are to data what websites were to documents. It’s the beginning of what
some call "The Data Web" — which is the first step to the full-blown
Semantic Web. SPARQL is also a big piece of what we are doing.
Reasoning: The Next Frontier After Search
Another key benefit of using RDF/OWL is that these languages are
designed to support formal logical reasoning. By marking up information
with RDF/OWL sophisticated search and inferencing can then take place
around it. For example, by marking up various people and their social
connections it is then possible to infer for example, that Sue is
Jane’s cousin, that Bob and Dave are colleagues, and that product A is
incompatible with product B, etc.
This kind of logical reasoning and
inference is essential to enable the next-generation of the Web — an
Intelligent Web — where software and online services start to help
people work, communicate, socialize and shop more productively. For
example it will enable something beyond search — it will enable
services that provide answers or suggestions. This is not necessarily important for all applications today, but it
will become increasingly important in the future. Content that exists
in RDF/OWL essentially has a longer shelf-life and will be easier to
reuse, integrate and reason across in the future.
Differentiating The Players
The Semantic Web provides a comprehensive and growing framework of
technologies that enable the next evolution of the Web — it is therefore a
much broader and farther-reaching vision than natural language search,
even though that is certainly one area that it will benefit. Natural language search is really just about matching search queries to documents, by analyzing the meaning of words. The Semantic Web is about defining the meaning of data — any data — words, data records, documents, social relationships, product listings, etc. — and providing a way to query that data, integrate it, and reason across it.
In our own
application and platform we make use of a lot of natural-language
processing (NLP) and we also provide semantic search capabilities, but
our focus is on something quite different than searching the Web — yet
equally useful and important to everyone. Frankly, I’m glad we are not
working on search, as big an opportunity as that is — I think
competing directly with Google is a daunting task and not one I would
want to be on! Instead, we are providing a new environment in which
people can start to benefit from the power of the Semantic Web in areas
that Google is very weak in today or is not in at all in some cases;
it’s really quite orthogonal to Google and other search engines.
So from the above conversation it should be clear that we are
working on The Semantic Web, not just natural language search and
so we are quite different from companies like Powerset, Textdigger and
others who are working on word-level semantic understanding of text.
But what about Metaweb — how do
we differ from them? — Well from what we can glean so far, what we are
doing is also very different from them as well but perhaps not as
different as we are from Powerset.
Radar Networks and Metaweb are
frequently cited as the two main startups working to bring
semantically-driven Web 3.0 online services to consumers. My guess is
that there will be some similarities but even more differences. There
may even be opportunities for us to work together someday. But we’re
all still in stealth, so it’s hard to get very specific about our
similarities and differences today. One thing is for sure, 2007 is
going to be an exciting year for both our companies, and for the
emerging Web 3.0 generation of companies and products.
Web 3.0 is just beginning
In any case the next-evolution of the Web — what we call "The
Intelligent Web" (and what many are also calling "Web 3.0") is still in
the very early stages and I don’t think it will really hit big until
2010 (for a graphical timeline of how I think this will unfold, click here). In the meantime we are all putting the pieces in place.
Fortunately Web 3.0 is a big space with a lot of opportunity and there is
room for a many different players and business models to co-exist and
compete. The fact that there are now several ventures in this space is
a good thing for all of us, for as one person said to me the other day,
"a rising tide lifts all boats." I’m happy that there is enough action
for there to actually be some confusion for me to clear up! Only a year
ago it felt like we were the only commercial voice a wilderness of
academic research. Today VC’s are lining up to speak to us and the other
companies in the space, and we are literally having to keep them at bay
until we start our B round.
Solving Information Overload
The key realization behind all this recent interest in semantics is that keyword
search and traditional content and data representations are declining
in productivity. As the Web gets vaster and more complex, and as
consumers must work with a growing array of content and services,
productivity is seriously being threatened — not only in search, but
also in every other area of our digital lives. Most of us who work
intensively with knowledge and information already have a direct and
intuitive experience of how information overload has grown, even in the
last decade. Clearly something must be done about this or in another
few years we will all be buried in our own information.
The Semantic Web provides the best (and really the only) long-term
solution to information overload and complexity. By starting to add
richer semantics to data, and by enabling applications to start
leveraging this, it will make it possible to help people regain more of
their productivity and to make software smarter — without having to
attempt to create super-duper science fiction artificial intelligence.
It’s very important to keep in mind that The Semantic Web does not require that machines understand or reason as well as people — the semantics of the Semantic Web can be created by people and/or machines, and it doesn’t have to be perfect, it simply has to add hints that make content less ambiguous and more structured. By contrast, both the keyword approach of Google and the natural language search approach of companies like Powerset — if they are to keep up with the growing complexity of the Web — will require increasingly intelligent software, because basically in such systems the software has to do all the work by itself.
The Semantic Web actually is really more about leveraging the
collective intelligence of people and applications to enrich content — rather than trying
to make applications do all the work on their own — but this will be a lot clearer later in the process when there are several Semantic Web apps that demonstrate this.
Here at Radar Networks we have
been working towards this vision steadily — and we’re proud of the fact that we started working with semantics long before it was "cool" — we know this space inside out
and we think that our first application on our platform will be an "Aha
experience" for users.
It certainly has taken some time to bring the Semantic Web to
fruition, but when you think about it, Web 1.0 took about 5 years to
really get started, so it’s not without precedent. A new generation of
the Web is a big undertaking. For now, all of us working on anything having to do with "semantics"
or Web 3.0 need to work together to start mapping out this space and educating the marketplace so
that people (including the press and VC’s, and early-adopters) can
understand the companies and technologies more clearly. The rather
humorous irony for all of us, is that the meaning of the term "semantic" is still so