Great Collective Intelligence Book; Includes a Chapter I Wrote

I highly recommend this new book on Collective Intelligence. It features chapters by a Who’s Who of thinkers on Collective Intelligence, including a chapter by me about “Harnessing the Collective Intelligence of the World Wide Web.”

Here is the full-text of my chapter, minus illustrations (the rest of the book is great and I suggest you buy it to have on your shelf. It’s a big volume and worth the read):


Harnessing the
collective intelligence

of the
World-Wide Web

Nova Spivack[1]


We are about to enter the third decade of the Web, sometimes referred to as “Web 3.0.” During this decade, the Web will evolve from a globally distributed fileserver into a globally distributed database. This shift will be enabled by a set of emerging technologies called The Semantic Web, which add a new layer of machine-understandable metadata about the meaning of information to the content of the Web.

The Semantic Web will catalyze a new era in collective intelligence. Individuals, groups, organizations and communities will be able to create, connect, find and share knowledge more intelligently and productively than ever before. Ultimately it will enable the Web itself, and all the people and applications that participate in it, to become more collectively intelligent.

Web 3.0—The Third Decade of the Web

The third-decade of the Web, “Web 3.0,” begins officially in 2010, but we are already entering the early stages of this transition today. To understand where the Web is headed it helps to zoom out to a larger historical context.

The final decade of the PC-era (1980—1990) was largely concerned with innovation on the front-end of the personal computer: the desktop and user interface layer of the PC. The focus of this period was in making PC’s easier to use with innovations such as Microsoft Windows, the Macintosh user-interface, and more consistent user-interfaces and integration across applications.

The first decade of the Web-era (“Web 1.0” from 1990 – 2000), was focused on the back-end of the Web: the core technologies and platforms of the Web such as HTML, HTTP, Web servers, search engines, commerce technologies, advertising technologies, and the basic architectures and business model of Web applications. This decade was mainly focused on the technology and infrastructure of the Web and most of the actual innovation dollars were spent on making things that only software developers could see.

In contrast, the second decade of the Web (“Web 2.0” from 2000—2010) has been largely focused on the front-end of the Web. Much of the innovation has not been on actual technology but rather on design patterns and user-interfaces for improving the end-user experience of the Web. During this decade we have focused on paradigms such as AJAX, which is a set of technologies and design methodologies for making Web sites more visually appealing and interactive.

Another big focus of Web 2.0 has been user-generated content, and in particular the practice of “tagging” content with subject tags. Tagging has in turn led to the concept of “folksonomies” in which taxonomies that organize data are evolved in a bottom-up fashion by a decentralized community of users.

The coming third-decade of the Web (“Web 3.0” from 2010—2020) will shift the emphasis back to the back-end of the Web. This decade will be largely focused on upgrading the technical infrastructure and content of the Web, based on emerging technologies such as the Semantic Web. During this decade the primary push will be enriching the Web so that it can function more like a database.

Today the Web is composed mainly of unstructured and semistructured data such as text files and Web pages. Keyword search engines are able to provide rudimentary search capabilities over this information, but only for the most simplistic queries. Compare current Web search to the more precise capabilities of queries against a database and the difference is immediately clear. The Web does not provide anything close to the search capabilities or precision of a database today. But that is about to change.

The Semantic Web provides a way to enrich both unstructured and structured data so that it can be queried with the precision of a database. Essentially, it provides a way to tag any information with metadata that explains what it means—and this metadata can be understood by software applications, such as search engines or knowledge management applications. It’s important to note that The Semantic Web is not a new Web, it’s just a new layer of the Web we already have. The semantic metadata that comprises the knowledge of the Semantic Web won’t live in some new place—it lives right in the existing documents and data on the Web. The knowledge of the Semantic Web is encoded using special new markup languages such as RDF and OWL.

This metadata is invisible to users (it doesn’t appear in Web browsers) but behind the scenes it can be read by any application that is compatible with these markup languages. So when any application, such as a next-generation search engine, sees a Web page or data record that contains RDF or OWL metadata, it can then use that metadata to understand what that page or data record means, is about, what it is related to, and how to interpret it. With Semantic Web metadata in place, searches on the Web will be as, or even more, precise as those in any database. But that is just the beginning of what the Semantic Web enables. Beyond merely improving search, the Semantic Web actually transforms the Web into a database—a worldwide database in which data records can be moved around, shared, and linked together in new ways.

On the basis of the technologies of The Semantic Web and the Web 3.0 era, we will then be able to enter the fourth decade of the Web (“Web 4.0”—2020—2030) in which the shift will turn back to the front-end of the Web. The Semantic Web doesn’t just add metadata about the meaning of information to the Web, it also enables metadata to be added about relationships, conceptual linkages, logical connections, and even logical rules. On the basis of this additional metadata, Web users and other applications will be able to harness the power of intelligent agents that will search the Web for things that interest them, make suggestions and recommendations, and even potentially transact on their behalf. This will open the door to a new kind of user-interface to the Web that is smarter and more conversational in nature, in which users will enter into dialogues with agents and interact with them search the Web and make decisions. A conversational interface to the Web will be more appropriate in the increasingly mobile world, when users will mostly interact with the Web from small portable mobile or embedded devices.

Users on mobile devices that have little to no screen real-estate will need a more productive way to interact with the Web than through a miniature browser; nobody like sorting through pages of Google results on a cell phone. Instead, they will want to simply ask a question (perhaps through a voice interface, rather than typing with their thumbs) and have a virtual intelligent assistant dispatch agents to find the best answers and then report back to them with results or to ask further questions or for a decision.

Smart, interactive conversational interfaces and intelligent agent-based virtual assistants are possible today, but only in narrow domains. In the Web 4.0 era they may in fact be our primary way of interacting with the whole Web and may be built into the user interface of most search engines, personal email providers, and leading Websites.

The Virtualization of Knowledge and Intelligence

In the long-term, the Semantic Web provides a way to move much of the “intelligence” that currently resides in the minds of individuals, groups and organizations, and/or that is hard-coded into various software and Web applications, out onto the Web itself. It provides a way to virtualize knowledge and intelligence in an explicitly machine-readable, universally accessible form. In other words, it provides a way to start making the Web “smarter.”

Knowledge and expertise that previously only existed in people’s heads, or had to be painstakingly coded into each particular vertical software application, will be represented in a form of universally readable metadata on the Web—just like HTML documents today. In other words, using the Semantic Web you can publish knowledge and even the underlying conceptual frameworks, rules and heuristics that embody domain expertise, on the Web in an abstract, machine-readable form.

There are many benefits that stem from this. For one thing, it will make it much easier to write smart software applications because much of the necessary “smarts” will not reside in the applications at all, but will rather live out there on the Web.

For example, to write an application that can intelligently assist with travel logistics, a developer will simply be able to point it at existing sets of knowledge and rules that exist for the travel domain on the Web already. The application will be able to draw on those pools of existing domain-knowledge without having to be specifically programmed to do so, because it understands the underlying standards of the Semantic Web. Similarly, the same application could just as easily help someone trade on the stock market, by simply pointing to domain knowledge on Semantic Web about finance and investment.

As more pools of domain knowledge are added to the Web around various verticals, all applications will potentially benefit. This sets up a kind of network effect in which a global knowledge commons begins to form and self-amplify over time. For example, first the travel domain is added to the Semantic Web. Then someone else adds domain knowledge about geography and links them together. Another group then adds domain knowledge about hotels, and another one adds domain knowledge about weather—and these all connect to each other in various ways.

With all of this interconnected knowledge on the Web in machine-readable form, application developers can then more easily and quickly write applications that understand concepts and rules related to booking travel reservations, and that can cross-reference reservation information with knowledge about geographic places, relevant weather, and hotels in those locations. And in the other direction, someone booking a hotel can then find information about relevant weather and book travel to get to that hotel. This is just one example. There are an infinite range of other possibilities for these technologies across all domains.

The key point of all this is that The Semantic Web enables applications to become thinner, yet at the same time smarter, by drawing on the collective intelligence embodied by the Web itself. It will become possible to write applications that understand one or more specialized vertical domains faster, and ultimately applications will become more general—they will be able to dynamically load in specialized domain knowledge for whatever domain is needed, without having to be specifically programmed or limited to just those domains.

Application developers will be able to draw on the knowledge added to the Web by others, instead of having to reinvent the wheel by programming all that knowledge directly into their applications every time. And in turn, the knowledge that their applications create can, if they want to allow it, be published back onto the Web for other applications to draw on as well.

Semantic Web as The Next Leap in Human Collective

Looking at the evolution of the Semantic Web in historical context, we can view it as the next big step in a longer process of the evolution of human collective intelligence.

Before the invention of written language, knowledge could only be communicated verbally and was handed down through oral traditions. During this period, one had to be in immediate physical proximity of someone who had certain knowledge in order to receive it from them. This meant that the maximum effective range of human collective intelligence was quite short in space and time.

With the invention of writing, and eventually printing, humanity was able to process knowledge over longer distances in space and time, and with less reliance on particular individuals. People could now engage in dialogues and dialectics with larger groups of people in more places, across larger distances in space, and with more precision over larger ranges of time.

The printing press took this to a new level by starting the process of mass-distribution of knowledge, but it still relied on an expensive physical manufacturing process and a paper medium that was perishable and costly to store and move around.

With advent of electronic communications of various forms, humanity achieved many milestones—the transmission of knowledge could take place at the speed of light, and using digital storage media we were freed from the limitations of the paper medium.

The Internet and the Web transformed the process of distributing knowledge even further—enabling a global knowledge commons to emerge. The Internet and Web enable anyone and everyone to become providers of knowledge, not just consumers—a fundamental shift in the way that knowledge transmission and media function. They are not just about the mass-distribution and mass-consumption of knowledge; they enable the mass-creation of knowledge. In some respects these technologies are analogues of the printing press in that they have democratized the process of creating, sharing and accessing knowledge by fundamentally changing the economics of the entire process—making it affordable and accessible to all.

But even on the Web, for all its many benefits, knowledge is still not free from the limitations of the human brain. Only humans can really understand the knowledge that is represented in Web sites and databases, for example. While all other processes related to the distribution, storage and access to knowledge can now be done digitally, using software and the Web, the processes of creating, consuming and actually understanding knowledge are still limited only to living humans. That’s where the Semantic Web comes in.

Liberating Knowledge and Intelligence from Human

The Semantic Web virtualizes human knowledge and expertise outside of human brains, and even outside of any particular software application—knowledge becomes essentially just more data on the Web. When we speak of knowledge here we don’t just mean information—the first-order raw data that is currently on the Web—we mean the actual meaning and interpretation of the information that is not on the Web but rather exists only in human brains.

The Semantic Web provides a way to make the meaning and interpretation of information explicit in a form that is unambiguous and publishable, and shareable, on the Web. This will make all this knowledge understandable by software. It’s almost like the invention of a new language—a sort of meta-language for formally expressing what exactly you mean when you say something. The impact of this could be enormous.

For the first time in human history, we won’t have to rely only on humans to create, understand and consume knowledge—our machines will be able to help us do this. They will help us work, collaborate, create, explore, monitor, discover, search, innovate, connect, and synthesize. This will open the door to an almost unimaginable amplification of the human mind, and human collective intelligence on this planet. At first the impact of this will largely be focused around assisting humans with simple clerical and research tasks, but the process will inevitably continue to evolve to a point where software will begin to originate new knowledge for us, advise us, and eventually to even start making certain types of decisions on our behalf.

Although the Semantic Web has barely moved from the lab to the mainstream Internet, it is in fact much farther along than most people realize. Today there are already semantic applications under development that can organize all your information automatically, make recommendations based on your dynamically changing interests, identify new connections between ideas or documents in different places, make logical inferences or discover contradictions, and even make
discoveries by doing proofs and explorations based on available data.

Within a few years these capabilities will begin to filter out to the mainstream users of the Internet, and with a decade or two at most, they will become commonplace. There are only a few billion humans today, and each of us can only cope with a small amount of information and relationships before we become overloaded. But in an era of machine understanding of human knowledge we may potentially be able to leverage thousands to millions of software agents to help us. This will vastly
increase our ability to cope with masses of information and relationships productively. In an increasingly complex, distributed, and rapidly changing world, we simply will not be able to cope in the future without help. The Semantic Web provides one path to solving these problems, enabling us to remain productive in the future.

Amplifying Human Collective Intelligence

The Semantic Web does not replace humans or take them out of the equation. It simply reduces the load on humans, freeing them from some of the pain of information overload, and providing a new path for software to begin to augment and even amplify human collective intelligence.

Today there are several barriers to human collective intelligence that arise from basic limitations of the human brain. Human individuals, and groups of humans, simply cannot process or share knowledge effectively beyond a certain level of information or relationship complexity and change. For this reason, collaboration and collective intelligence are often easier to achieve and yield better results in small groups than large groups.

As group size increases, productive collective intelligence becomes dramatically harder to achieve. Thus, ironically even though larger groups offer the potential for exponential increases in collective intelligence, in practice the opposite is usually the result: the larger teams get, the dumber they get. An entire industry of management consultants and facilitators exists because of these inefficiencies.

The Semantic Web may be able to help with this age-old problem. By enabling software to understand information and relationships, we may be able to begin to automatically and intelligently facilitate interpersonal and group collaboration and knowledge management, and this may finally enable larger groups to become exponentially smarter instead of dumber.—A New Service for Collective Intelligence

My own company, Radar Networks, has recently introduced a new service based on the Semantic Web, called Twine ( that focuses on amplifying human collective intelligence. Twine helps individuals and groups manage and share knowledge more productively, using the Semantic Web.

As people use Twine it learns from them and automatically organizes and connects their information with other related information, saving them valuable time and enabling them to discover connected knowledge. Twine provides individuals and groups with a smart virtual environment for their knowledge.

Twine works with all kinds of knowledge—email, RSS, Web pages, documents, photos, videos, audio, contact records, or anything else. Regardless of where information actually resides, Twine enables users to view it as if it were in one place, and to see how it is connected and organized. Twine also automatically helps to make sense of information and to make it more easily searchable.

Twine is a Web-based online service that is completely built using the Semantic Web. Although it is only in early beta-testing at the time of this writing, it is already demonstrating that intelligent machine-augmentation of individual and group knowledge management is possible and improves productivity and collaboration.

As Twine unfolds and spreads to more individuals, groups and teams, and organizations and communities, it has the potential to become a new backbone for collective intelligence and knowledge sharing worldwide. At least that is the vision of the project. Time will tell whether we succeed it.

From Global Knowledge Commons to Global Brain

If the Semantic Web develops as predicted, it is possible that within 20 years much, if not all, human knowledge will be represented on the Web in machine-understandable form. We have seen the beginnings of this trend with services such as the Wikipedia. More recently, another initiative called the DBpedia is creating a Semantic Web version of the Wikipedia. But this is just the start of this trend.

As more and more applications and services start producing Semantic Web metadata and exposing it back to other applications and services on the Web, we will begin to create a new global knowledge commons. At first these different services will function like islands of knowledge, but then they will begin to interconnect.

A piece of knowledge in one place will link to and from pieces of knowledge in other
places. Eventually this will become a giant associative network, not so unlike the brain, but on a global scale. And as people and applications surf through its connections and consume its knowledge, adding new knowledge and connections back to it as they do, it will change and self-organize dynamically. Just as the first generations of the Web have enabled a global medium for “hypertext,” the Semantic Web will enable a global medium for “hyperdata.”

As one projects the future evolution of the Web and the emerging Semantic Web, one cannot help but notice certain similarities to the human mind. Some have even ventured to call this the beginning of an emerging “Global Brain.” It is too early to tell how similar it will truly be to the actual human brain. However we can already predict with confidence that it will a system that collectively will be capable of at least rudimentary learning, memory, perception, planning and reasoning.

The human brain is a massively parallel collective intelligence engine in which billions of neurons interact across trillions of connections to process and generate knowledge.

Similarly, the collective intelligence of the Web will involve the combined interactions and intelligence of billions of humans and machines across trillions of relationships. These processes will not be guided centrally, and the system will most likely not be centralized around a single construct of a “self” nor will it have anything like a human body.

While it will be possible to say the system as a whole is intelligent, it will be difficult to locate any particular source of that intelligence; the intelligence will come from everywhere: from the humans, the software and even the data and links that comprise the Web.

Because the Web is quite different from the human brain, it is likely that its intelligence will be different from what we think of as human intelligence today. But it will nonetheless be intelligent—in a massively distributed, emergent, and chaotic way that we humans may not be able to even comprehend. The “thoughts” the Web will think may be just too vast and complex for us to even recognize, let alone imagine or understand. Yet perhaps in decade-long time-scales at least, we will begin to be able to see the outlines of its thinking.

Nova Spivack is the CEO and founder of Radar Networks, a San-Francisco company that is pioneering applications of the Semantic Web for distributed collaboration and knowledge management with a new service called Mr. Spivack is a recognized authority on the Semantic Web and future of the Web, which is sometimes called “Web 3.0.” A more detailed bio can be found at his company website:

Social tagging: > > > > > > > > > > >

3 Responses to Great Collective Intelligence Book; Includes a Chapter I Wrote

  1. Mgccl says:

    I found natural language flawed
    people should construct a better language, like Lojban, so computers can fully understand humans.

  2. Well said, very well said indeed. I wonder how many people will truly grasp the nuances of the future you allude to within this article. For instance, the future of the currently centralized search engine triumvirate is on shaky ground – specially if they do not evolve to embrace the power offered by distributed semantic (or meta data) searching. Of course I suspect they will adapt, but the financial models of commercializing the traffic will have to change too 🙂

  3. Rob Bryanton says:

    Hi Nova, what an exciting article, and I can hardly wait to for my Twine registration to be processed. I have written an entry today in the Imagining the Tenth Dimension blog called “Collective Intelligence, Cognitive Surplus and Chaos”, this is my attempt to tie together the ideas of information space and probability space with the implications of the Semantic Web and our increasingly collaborative world. I’ve also created a permanent link to your blog in my blog’s Interesting Links section, as this is an important piece of the puzzle in trying to imagine how it all fits together.
    Thanks for the inspiring writing!
    Rob Bryanton