The Rise of the Chief Analytics Officer

My new article on the growing role of the Chief Analytics Officer has gone live on GigaOm. Read it here.

Bottlenose Announces Free Live Visualization of Global Social Trends

Bottlenose has just launched something very very cool: A free version of it’s live visualization of trends in the Twitter firehose.  Check it out at http://sonar.bottlenose.com and get your own embed for any topic. This is the future of real-time marketing. And by the way it’s also an awesome visualization of the global mind as it thinks collective thoughts.

Making Sense of Streams

This is a talk I’ve been giving on how we filter the Stream at Bottlenose.

You can view the slides below, or click here to replay the webinar with my talk.

Note: I recommend the webinar if you have time, as I go into a lot more detail than is in the slides – in particular some thoughts about the Global Brain, mapping collective consciousness, and what the future of social media is really all about.  My talk starts at 05:38:00 in the recording.

 

Bottlenose Beat Bit.ly to the First Attention Engine – But It’s Going to Get Interesting

Bottlenose (disclosure: my startup) just launched the first attention engine this week.

But it appears that Bit.ly is launching one soon as well.

It’s going to get interesting to watch this category develop. Clearly there is new interest in building a good real-time picture of what’s happening, and what’s trending, and providing search, discovery, and insights around that.

I believe Bottlenose has the most sophisticated map of attention today, and we have very deep intellectual property across 8 pending patents and a very advanced technology stack behind it as well. And we have some pretty compelling user-experiences on top of it all. So in short, we have a lead here on many levels. (Read more about that here)

But that might not even matter because I think ultimately Bit.ly will be a potential partner for Bottlenose, rather than a long-term competitor — at least if they stay true to their roots and DNA as a data provider rather than a user-experience provider. I doubt that Bit.ly will succed in making a search destination that consumers will use and I’m guessing that is not really their goal.

In testing their Realtime service, my impression is that it feels more like a Web 1.0 search engine. Static search results for advanced search style queries. I don’t see that as a consumer experience.

Bottlenose on the other hand, goes way into a consumer UX, with live photos, newspapers, topic portals, a dashboard, etc. It is also a more dynamic, always changing, realtime content consumption destination. Bottlenose feels like media, not merely search (in fact I think search, news and analytics are actually converging in the social network era).

Bottlenose has a huge emphasis on discovery, analytics, and other further actions on the content that go beyond just search.

I think in the end Bit.ly’s Realtime site will really demonstrate the power of their data, which will still mainly be consumed via their API rather than in their own destination. I’m hopeful that Bit.ly will do just that. It would be useful to everyone, including Bottlenose.

The Threat to Third-Party URL Shorteners

If I were Bit.ly, my primary fear today would be Twitter with their t.co shortener. That is a big threat to Bit.ly and will probably result in Bit.ly losing a lot of their data input over time as more Tweets have t.co links on them than Bit.ly links.

Perhaps Bit.ly is attempting to pivot their business to the user experience side in advance of such a threat potentially reducing their data set and thus the value of their API. But without their data set I don’t see where they can get the data to measure the present. So as a pivot it would not work – where would they get the data?

In other words, if people are not using as many Bit.ly links in the future, Bit.ly will see less attention. And trends point to this happening in fact — Twitter has their own shortener. So does Facebook. So does Google. Third-party shorteners will probably represent a decreasing share of messages and attention over time.

I think the core challenge for Bit.ly is to find a reason for their short URLs to be used instead of native app short URLs. Can they add more value to them somehow? Could they perhaps build in monetization opportunities for parties who use their shortener, for example? Or could they provide better analytics than Twitter or Facebook or Google will on short URL uptake (Bit.ly arguably does, today).

Bottlenose and Bit.ly Realtime: Compared and Contrasted

In any case there are a few similarities between what Bit.ly may be launching and what Bottlenose provides today.

But there are far more differences.

These products only partially intersect. Most of what Bottlenose does has no equivalent in Bit.ly Realtime. Similarly much of what Bit.ly actually does (outside of their Realtime experiment) is different from what Bottlenose does.

It is also worht mentioning that Bit.ly’s “Realtime” app is a Bit.ly “labs” project and is not their central focus, whereas at Bottlenose it is 100% of what we do. Mapping the present is our core focus.

There is also a big difference in business model. Bottelnose does map the present in high-fidelity, but has no plans currently to provide a competing shortening API, or an API about shortURLs, like Bit.ly presently does. So currently we are not competitors.

Also, where Bit.ly currently has a broader and larger data set, Bottlenose has created a more cutting-edge and compelling user-experience and has spent more time on a new kind of computing architecture as well.

The Bottlenose StreamOS engine is worth mentioning here: Bottlenose has new engine for real-time big data analytics engine that uses a massively distributed and patent pending “crowd computing” architecture.

We actually have buit what I think is the most advanced engine and architecture on the planet for mapping attention in real-time today.

The deep semantics and analytics we compute in realtime are very expensive to compute centrally. Rather than compute everything in the center we compute everywhere; everyone who uses Bottlenose helps us to map the present.

Our StreamOS engine is in fact a small (just a few megabytes) Javascript and HTML5 app (the size of a photo) that runs in the browser or device of each user. Almost all the computing and analytics that Bottlenose does happens in the browser at the edge.

We have very low centralized costs. This approach scales better, faster, and more cheaply than any centralized approach can. The crowd literally IS our computer. It’s the Holy Grail of distributed real-time indexing.

We also see a broader set of data than Bit.ly does. We don’t only see content that has a bit.ly URL on it. We see all kinds of messages moving through social media — with other shortURls, and even without URLs.

We see Bit.ly URLs, but we also see data that is outside of the Bit.ly universe. I think ultimately it’s more valuable to see all the trends across all data sources, and even content that contains no URLs at all (Bottlenose analyzes all kinds of messages for example, not just messages that contain URLs, let alone just Bit.ly URLs).

Finally, the use-cases for Bottlenose go far beyond just search, or just news reading and news discovery.

We have all kinds of  brands and enterprises actually using our Bottlenose Dashboard product, for example, for social listening, analytics and discovery. I don’t see Bit.ly going as deeply into that as us.

For these reasons I’m optimistic that Bottlenose (and everyone else) will benefit from what Bit.ly may be launching — particularly via their API, if they make their attention data available as an additional signal.

This space is going to get interesting fast.

(To learn more about what Bottlenose does, read this)

 

How Bottlenose Could Improve the Media and Enable Smarter Collective Intelligence

How Bottlenose Could Improve the Media and Enable Smarter Collective Intelligence

This article is part of a series of articles about the Bottlenose Public Beta launch.

Bottlenose – The Now Engine – The Web’s Collective Consciousness Just Got Smarter

How Bottlenose Could Improve the Media and Enable Smarter Collective Intelligence (you are here)

A New Window Into the Collective Consciousness

Bottlenose offers a new window into what the world is paying attention to right now, globally and locally.

We show you a live streaming view of what the crowd is thinking, sharing and talking about. We bring you trends, as they happen. That means the photos, videos and messages that matter most. That means suggested reading, and visualizations that cut through the clutter.

The center of online attention and gravity has shifted from the Web to social networks like Twitter, Facebook and Google+. Bottlenose operates across all them, in one place, and provides an integrated view of what’s happening.

The media also attempts to provide a reflection of what’s happening in the world, but the media is slow, and it’s not always objective. Bottlenose doesn’t replace the media — at least not the role of the writer. But it might do a better job of editing or curating in some cases, because it objectively measures the crowd — we don’t decide what to feature, we don’t decide what leads. The crowd does.

Other services in the past, like Digg for example, have helped pioneer this approach. But we’ve taken it further — in Digg people had to manually vote. In Bottlenose we simply measure what people say, and what they share, on public social networks.

Bottlenose is the best tool for people who want to be in the know, and the first to know. Bottlenose brings a new awareness of what’s trending online, and in the world, and how those trends impact us all.

We’ve made the Bottlenose home page into a simple Google-like query field, and nothing more. Results pages drop you into the app itself for further exploration and filtration. Except you don’t just get a long list of results, the way you get on Google.

Instead, you get an at-a-glance start page, a full-fledged newspaper, a beautiful photo gallery, a lean-back home theater, a visual map of the surrounding terrain, a police scanner, and Sonar — an off-road vehicle so that you can drive around and see what’s trending in networks as you please. We’ve made the conversation visual.

Each of these individual experiences is an app on top of the Bottlenose StreamOS platform, and each is a unique way of looking at sets and subsets of streams. You can switch between views effortlessly, and you can save anything for persistent use.

Discovery, we’ve found from user behavior, has been the entry point and the connective tissue for the rest of the Bottlenose experience all along. Our users have been asking for a better discovery experience, just as Twitter users have been asking for the same.

The new stuff you’ll see today has been one of the most difficult pieces for us to build computer-science-wise. It is a true technical achievement by our engineering team.

In many ways it’s also what we’ve been working towards all along. We’re really close now to the vision we held for Bottlenose at the very beginning, and the product we knew we’d achieve over time.

The Theory Behind It: How to Build a Smarter Global Brain

If Twitter, Facebook, Google+ and other social networks are the conduits for what the planet is thinking, then Bottlenose is a map of what the planet is actually paying attention to right now. Our mission is to “organize the world’s attention.” And ultimately I think by doing this we can help make the world a smarter place. At at the end of the day that’s what gets me excited in life.

After many years of thinking about this, I’ve come to the conclusion that the key to higher levels of collective intelligence is not making each person smarter, and it’s not some kind of Queen Bee machine up in the sky that tells us all what to do and runs the human hive. It’s not some fancy kind of groupware either. And it’s not the total loss of individuality into a Borg-like collective either.

I think that better collective intelligence really comes down to enabling better collective consciousness. The more conscious we can be of who we are collectively, and what we think, and what we are doing, the smarter we can actually be together, of our own free will, as individuals. This is a bottom-up approach to collective consciousness.

So how might we make this happen?

For the moment, let’s not try to figure out what consciousness really is, because we don’t know, and we probably never will, but regardless, for this adventure, we don’t need to. And we don’t even need to synthesize it either.

Collective consciousness is not a new form of consciousness, rather, it’s a new way to channel the consciousness that’s already there — in us. All we need to do is find a better way to organize it… or rather, to enable it to self-organize emergently.

What does consciousness actually do anyway?

Consciousness senses the internal and external world, and maintains a model of what it finds — a model of the state of the internal and external world that also contains a very rich model of “self” within it.

This self construct has an identity, thoughts, beliefs, emotions, feelings, goals, priorities, and a focus of attention.

If you look for it, it turns out there isn’t actually anything there you can find except information — the “self” is really just a complex information construct.

This “self” is not really who we are, it’s just a construct, a thought really — and it’s not consciousness either. Whatever is aware is aware of the self, so the self is just a construct like any other object of thought.

So given that this “self” is a conceptual object, not some mystical thing that we can’t ever understand, we should be able to model it, and make something that simulates it. And in fact we can.

We can already do this for artificially intelligent computer programs and robots in a primitive way in fact.

But what’s really interesting to me is that we can also do it for large groups of people too. This is a big paradigm shift – a leap. Something revolutionary really. If we can do it.

But how could we provide something like a self for groups, or for the planet as a whole? What would it be like?

Actually, there is already a pretty good proxy for this and it’s been around for a long time. It’s the media.

The Media is a Mirror

The media senses who we are and what we’re doing and it builds a representation — a mirror – in the form of reports, photos, articles, and stats about the state of the world. The media reflects who we are back to us. Or at least it reflects who it thinks we are…

It turns out it’s not a very accurate mirror. But since we don’t have anything better, most of us believe what we see in the media and internalize it as truth.

Even if we try not to, it’s just impossible to avoid the media that bombards us from everywhere all the time. Nobody is really separate from this, we’re all kind of stewing a media soup, whether we like it or not.

And when we look at the media and we see stories – stories about the world, about people we know, people we don’t know, places we live in, and other places, and events — we can’t help but absorb them. We don’t have first hand knowledge of those things, and so we take on faith what the media shows us.

We form our own internal stories that correspond to the stories we see in the media. And then, based on all these stories, we form beliefs about the world, ourselves and other people – and then those beliefs shape our behavior.

And there’s the rub. If the media gives us an inaccurate picture of reality, or a partially accurate one, and then we internalize it, it then conditions our actions. And so our actions are based on incomplete or incorrect information. How can we make good decisions if we don’t have good information to base them on?

The media used to be about objective reporting, and there are still those in the business who continue that tradition. But real journalists — the kind who would literally give their lives for the truth — are fewer and fewer. The noble art of journalism is falling prey, like everything else, to commercial interests.

There are still lots of great journalists and editors, but there are fewer and fewer great media companies. And fewer rules and standards too. To compete in today’s media mix it seems they have to stoop to the level of the lowest common denominator and there’s always a new low to achieve when you take that path.

Because the media is driven by profit, stories that get eyeballs get prioritized, and the less sensational but often more statistically representative stories don’t get written, or don’t make it onto the front page. There is even a saying in the TV news biz that “If it bleeds, it leads.”

Look at the news — it’s just filled with horrors. But that’s not an accurate depiction of the world. For example crimes don’t happen all the time, everywhere, to everyone – they are statistically quite unlikely and rare — yet so much news is devoted to crimes for example. It’s not an accurate portrayal of what’s really happening for most people, most of the time.

I’m not saying the news shouldn’t report crime, or show scary bad things. I’m just pointing out that the news is increasingly about sensationalism, fear, doubt, uncertainty, violence, hatred, crime, and that is not the whole truth. But it sells.

The problem is not that these things are reported — I am not advocating for censorship in any way. The problem is about the media game, and the profit motives that drive it. Media companies just have to compete to survive, and that means they have to play hard ball and get dirty.

Unfortunately the result is that the media shows us stories that do not really reflect the world we live in, or who we are, or what we think, accurately – these stories increasingly reflect the extremes, not the enormous middle of the bell curve.

But since the media functions as our de facto collective consciousness, and it’s filled with these images and stories, we cannot help but absorb them and believe them, and become like them.

But what if we could provide a new form of media, a more accurate reflection of the world, of who we are and what we are doing and thinking? A more democratic process, where anyone could participate and report on what they see.

What if in this new form of media ALL the stories are there, not just some of them, and they compete for attention on a level playing field?

And what if all the stories can compete and spread on their merits, not because some professional editor, or publisher, or advertiser says they should or should not be published?

Yes this is possible.

It’s happening now.

It’s social media in fact.

But for social media to really do a better job than the mainstream media, we need a way to organize and reflect it back to people at a higher level.

That’s where curation comes in. But manual curation is just not scalable to the vast number of messages flowing through social networks. It has to be automated, yet not lose its human element.

That’s what Bottlenose is doing, essentially.

Making a Better Mirror

To provide a better form of collective consciousness, you need a measurement system that can measure and reflect what people are REALLY thinking about and paying attention to in real-time.

It has to take a big data approach – it has to be about measurement. Let the opinions come from the people, not editors.

This new media has to be as free of bias as possible. It should simply measure and reflect collective attention. It should report the sentiment that is actually there, in people’s messages and posts.

Before the Internet and social networks, this was just not possible. But today we can actually attempt it. And that is what we’re doing with Bottlenose.

But this is just a first step. We’re dipping our toe in the water here. What we’re doing with Bottlenose today is only the beginning of this process. And I think it will look primitive compared to what we may evolve in years to come. Still it’s a start.

You can call this approach mass-scale social media listening and analytics, or trend detection, or social search and discovery. But it’s also a new form of media, or rather a new form of curating the media and reflecting the world back to people.

Bottlenose measures what the crowd is thinking, reading, looking at, feeling and doing in real-time, and coalesces what’s happening across social networks into a living map of the collective consciousness that anyone can understand. It’s a living map of the global brain.

Bottlenose wants to be the closest you can get to the Now, to being in the zone, in the moment. The Now is where everything actually happens. It’s the most important time period in fact. And our civilization is increasingly now-centric, for better or for worse.

Web search feels too much like research. It’s about the past, not the present. You’re looking for something lost, or old, or already finished — fleeting.  Web search only finds Web pages, and the Web is slow… it takes time to make pages, and time for them to be found by search engines.

On the other hand, discovery in Bottlenose is about the present — it’s not research, it’s discovery. It’s not about memory, it’s about consciousness.

It’s more like media — a live, flowing view of what the world is actually paying attention to now, around any topic.

Collective intelligence is theoretically made more possible by real-time protocols like Twitter. But in practice, keeping up with existing social networks has become a chore, and not drowning is a real concern. Raw data is not consciousness. It’s noise. And that’s why we so often feel overwhelmed by social media, instead of emboldened by it.

But what if you could flip the signal-to-noise ratio? What if social media could be more like actual media … meaning it would be more digestible, curated, organized, consumable?

What if you could have an experience that is built on following your intuition, and living this large-scale world to the fullest?

What if this could make groups smarter as they get larger, instead of dumber?

Why does group IQ so often seem inversely proportional to group size? The larger groups get, the dumber and more dysfunctional they become. This has been a fundamental obstacle for humanity for millennia.

Why can’t groups (including communities, enterprises, even whole societies) get smarter as they get larger instead of dumber? Isn’t it time we evolve past this problem? Isn’t this really what the promise of the Internet and social media is all about? I think so.

And what if there was a form of media that could help you react faster, and smarter, to what is going on around you as it happens, just like in real life?

And what if it could even deliver on the compelling original vision of the cyberspace as a place you could see and travel through?

What about getting back to the visceral, the physical?

Consciousness is interpretive, dynamic, and self-reflective. Social media should be too.

This is the fundamental idea I have been working on in various ways for almost a decade. As I have written many times, the global brain is about to wake up and I want to help.

By giving the world a better self-representation of what it is paying attention to right now, we are trying to increase the clock rate and resolution of collective consciousness.

By making this reflection more accurate, richer, and faster, and then making it available to everyone, we may help catalyze the evolution of higher levels of collective intelligence.

All you really need is a better mirror. A mirror big enough for large groups of people to look into and see what they are collectively paying attention to in it, together. By providing groups with a clearer picture of their own state and activity, they can adapt to themselves more intelligently.

Everyone looks in the collective mirror and adjusts their own behavior independently — there is no top-down control — but you get emergent self-organizing intelligent collective behavior as a result. The system as a whole gets smarter. So the better the mirror, the smarter we become, individually and collectively.

If the mirror is really fast, really good, really high res, and really accurate and objective – it can give groups an extremely important, missing piece: Collective consciousness that everyone can share.

We need collective consciousness that exists outside of any one person, and outside of any one perspective or organization’s agenda, and is not merely just in the parts (the individuals) either. Instead, this new level of collective consciousness should be something that is coalesced into a new place, a new layer, where it exists independently of the parts.

It’s not merely the sum of the parts, it’s actually greater than the sum – it’s a new level, a new layer, with new information in it. It’s a new whole that transcends just the parts on their own.  That’s the big missing piece that will make this planet smarter, I think.

We need this yesterday. Why? Because in fact collectives — groups, communities, organizations, nations — are the units of change on this planet. Not individuals.

Collectives make decisions, and usually these decisions are sub-optimal. That’s dangerous. Most of the problems we’ve faced and continue to face as a species come down to large groups doing stupid things, mainly due not having accurate information about the world or themselves. This is, ultimately, an engineering problem.

We should fix this, if we can.

I believe that the Internet is an evolving planetary nervous system, and it’s here to to make us smarter. But it’s going to take time. Today it’s not very smart. But it’s evolving fast.

Higher layers of knowledge, and intelligence are emerging in this medium, like higher layers of the cerebral cortex, connecting everything together ever more intelligently.

And we want to help make it even smarter, even faster, by providing something that functions like self-consciousness to it.

Now I don’t claim that what we’re making with Bottlenose is the same as actual consciousness — real consciousness is, in my opinion a cosmic mystery like the origin of space and time. We’ll probably never understand it. I hope we never do. Because I want there to be mystery and wonder in life. I’m confident there always will be.

But I think we can enable something on a collective scale, that is at least similar, functionally, to the role of self-consciousness in the brain — something that reflects our own state back to us as a whole all the time.

After all, the brain is a massive collective of hundreds of billions of neurons and trillions of connections that themselves are not conscious or even intelligent – and yet it forms a collective self and reacts to itself intelligently.

And this feedback loop – and the quality of the reflection it is based on – is really the key to collective intelligence, in the brain, and for organizations and the planet.

Collective intelligence is an emergent phenomena, it’s not something to program or control. All you need to do to enable it and make it smarter, is give groups and communities better quality feedback about themselves. Then they get smarter on their own, simply by reacting to that feedback.

Collective intelligence and collective consciousness, are at the end of the day, a feedback loop. And we’re trying to make that feedback loop better.

Bottlenose is a new way to curate the media, a new form of media in which anyone can participate but the crowd is the editor. It’s truly social media.

This is an exciting idea to me. It’s what I think social media is for and how it could really help us.

Until now people have had only the mainstream, top-down, profit-driven media to look to. But by simply measuring everything that flows through social networks in real time, and reflecting a high-level view of that back to everyone, it’s possible to evolve a better form of media.

It’s time for a bottom-up, collectively written and curated form of media that more accurately and inclusively reflects us to ourselves.

Concluding Thoughts

I think Bottlenose has the potential to become the giant cultural mirror we need.

Instead of editors and media empires sourcing and deciding what leads, the crowd is the editor, the crowd is the camera crew, and the crowd decides what’s important. Bottlenose simply measures the crowd and reflects it back to itself.

When you look into this real-time cultural mirror that is Bottlenose, you can see what the community around any topic is actually paying attention to right now. And I believe that as we improve it, and if it becomes widely used, it could facilitate smarter collective intelligence on a broader scale.

The world now operates at a ferocious pace and search engines are not keeping up. We’re proud to be launching a truly present-tense experience. Social messages are the best indicators today of what’s actually important, on the Web, and in the world.

We hope to show you an endlessly interesting, live train of global thought. The first evolution of the Stream has run its course and now it’s time to start making sense of it on a higher level. It’s time to start making it smart.

With the new Bottlenose, you can see, and be a part of, the world’s collective mind in a new and smarter way. That is ultimately why Bottlenose is worth participating in.

Keep Reading

Bottlenose – The Now Engine – The Web’s Collective Consciousness Just Got Smarter

How Bottlenose Could Improve the Media and Enable Smarter Collective Intelligence (you are here)

 

Bottlenose – The Now Engine – The Web’s Collective Consciousness Just Got Smarter

Recently, one of Twitter’s top search engineers tweeted that Twitter was set to “change search forever.” This proclamation sparked a hearty round of speculation and excitement about what was coming down the pipe for Twitter search.

The actual announcement featured the introduction of autocomplete and the ability to search within the subset of people on Twitter that you follow — both long-anticipated features.

However, while certainly a technical accomplishment (Twitter operates a huge scale and building these features must have been very difficult), this was an iterative improvement to search…an evolution, not a revolution.

Today I’m proud to announce something that I think could actually be revolutionary.

 

And here’s the video….

 

My CTO/Co-founder, Dominiek ter Heide, and I have been working for 2 years on an engine for making sense of social media. It’s called Bottlenose, and we started with a smart social dashboard.

Now we’re launching the second stage of our mission “to organize the world’s attention” — a new layer of Bottlenose that provides a live discovery portal for the social web.

This new service measures the collective consciousness in real-time and shows you what the crowd is actually paying attention to now, about any topic, person, brand, place, event… anything.

If the crowd is thinking about it, we see it. It’s a new way to see what’s important in the world, right now.

This discovery engine, combined with our existing dashboard, provides a comprehensive solution for discovering what’s happening, and then keeping up with it over time.

Together, these two tools not only help you stay current, they provide compelling and deep insights about real-time trends, influencers, and emerging conversations.

All of this goes into public beta today.

An Amazing Team

I am very proud of what we are launching today, in many ways — while still just a step on a longer journey — it is the culmination of an idea I’ve been working on, thinking about, dreaming of… for decades… and I’d love you to give it a spin.

And I’m proud of my amazing technical team — they are the most talented technical team I’ve ever worked with in my more than 20 years in this field.

I have never seen such a small team deliver so much, so well. And Bottlenose is them – it is their creation and their brilliance that has made this possible. I am really so thankful to be working with this crew.

Welcome to the Bottlenose Public Beta

So what is Bottlenose anyway?

It is a real-time view of what’s actually important across all the major social networks — the first of its kind — what you might call a “now engine.”

This new service is not about information retrieval. It’s about information awareness. It’s not search, it’s discovery.

We don’t index the past, we map the present. That’s why I think it’s better to call it a discovery engine than a search engine. Search implies research towards a specific desired answer, whereas discovery implies exploration and curiosity.

We measure what the crowd is paying attention to now, and we build a living, constantly learning and evolving, map of the present.

Twitter has always encouraged innovation around their data, and that innovation is really what has fueled their rapid growth and adoption. We’ve taken them at their word and innovated.

We think that what we have built adds tremendous value to the ecosystem and to Twitter.

But while Twitter data is certainly very important and high volume, Bottlenose is not just about Twitter… we integrate the other leading social networks too: Facebook, LinkedIn, Google+, YouTube, Flickr, and even networks whose data comes through them like Pinterest and Instagram. And we also see RSS too.

We provide a very broad view of what’s happening across the social web — a view that is not available anywhere else.

Bottlenose is what you’d build if you got the chance to start over and work on the problem from scratch — a new and comprehensive vision for how to make sense of what’s happening across and within social networks.

We think it could be for the social web what Google was for the Web. Ok that’s a bold statement – and perhaps it’s wishful thinking – but we’re at least off to a good start here and we’re pushing the envelope farther than it has ever been pushed. Try it!

Oh and one more thing, why the name? We chose it because dolphins are smart, they’re social, they hunt in pods, they have sonar. We chose the name as an homage to their bright and optimistic social intelligence. We felt it was a good metaphor for how we want to help people surf the Stream.

Thanks for reading this post, and thanks for your support. If you have a few moments to spare today, we’d love it if you gave Bottlenose a try. And remember, it’s still a beta.

Note: It’s Still a Beta!

Before I get too deep into the tech and all the possibilities and potential I see in Bottlenose, I first want to make it very clear that this is a BETA.

We’re still testing, tuning, adding stuff, fixing bugs, and most of all learning from our users.

There will be bugs and things to improve. We know. We’re listening. We’re on it. And we really appreciate your help and feedback as we continue to work on this.

Want to Know More?

How Bottlenose Could Improve the Media and Enable Smarter Collective Intelligence

 

 

Keeping Up With the Stream — New Problems and Solutions

This is Part III of a series of articles on the new era of the Stream, a new phase of the Web.

In Part I, The Message is the Medium, I explored the shift in focus on the Web from documents to messages.

In Part II, Drowning in the Stream, we dove deep into some of the key challenges the Stream brings with it.

Here in Part III, we will discuss new challenges and solutions for keeping up with streams as they become increasingly noisy and fast-moving.

 

Getting Attention in Streams

Today if you post a message to Twitter, you have a very small chance of that message getting attention. What’s the solution?

You can do social SEO and try to come up with better, more attention-grabbing, search engine attracting, headlines. You can try to schedule your posts to appear at optimal times of day. You can even try posting the same thing many times a day to increase the chances of it being seen.

This last tactic is called “Repeat Posting” and it’s soon going to be clogging up all our streams with duplicate messages. Why am I so sure this is going to happen? Because we are in an arms race for attention. In a room where everyone is talking, everyone starts talking louder, and soon everyone is shouting.

Today when  you post a message to Twitter, the chances of getting anyone’s attention are low and they are getting lower.  If you have a lot of followers, the chances are a little better that at least some of them may be looking at their stream at precisely the time you post. But still, even with a lot of followers, the odds are that most of your followers probably won’t be online at that precise moment you post something, and so they’ll miss it.

Scheduled Posting

But it turns out there are optimal times of day to post, when more of your followers are likely to be looking at their streams. A new category of apps, typified by Buffer, has emerged to help you schedule your Tweets to post at such optimal times.

Using apps like Buffer, you can get more attention to your Tweets, but this is only a temporary solution. Because the exponential growth of the Stream means that soon even posting a message at an optimal time will not be enough to get it in front of everyone who should see it.

Repeat Posting

To really get noticed, above the noise, you need your message to be available at more than one optimal time, for example many times a day, or even every hour.

To achieve this, instead of posting a message once at the optimal time per day, we may soon see utilities that automatically post the same message many times a day – maybe every hour – perhaps with slightly different wording of headlines, to increase the chances that people will see them. I call this “repeat posting” or “message rotation.”

Repeat posting tools may get so sophisticated that they will A/B test different headlines and wordings and times of day to see what gets the best clickthroughs and then optimize for those. These apps may even intelligently rotate a set of messages over several days, repeating them optimally until they squeeze out every drop of potential attention and traffic, much like ad servers and ad networks rotate ads today.

But here’s the thing — as soon as anyone starts manually or automatically using repeat posting tactics, it will create an arms race – others will notice it, and compete for attention by doing the same thing. Soon everyone will have to post repeatedly to simply get noticed above the noise of all the other repeat posts.

This is exactly what happens when you are speaking in a crowded room. In a room full of people who are talking at once, some people start talking louder. Soon everyone is shouting and losing their voice at the same time.

This problem of everyone shouting at once is what is soon going to happen on Twitter and Facebook and other social networks. It’s already happening in some cases – more people are posting the same message more than once a day to get it noticed.

It’s inevitable that repeat posting behavior will increase, and when everyone starts doing it, our channels will become totally clogged with redundancy and noise. They will become unusable.

What’s the solution to this problem?

What to Do About Repeat Posting

One thing that is not the solution is to somehow create rules against repeat posting. That won’t work.

Another solution that won’t work is to attempt to detect and de-dupe repeats that occur. It’s hard to do this, and easy to create repeat posts that have different text and different links, to evade detection.

Another solution might be to recognize that repeat posting is inevitable, but to make the process smarter: Whenever a repeat posting happens, delete the previous repeat post. So at any given time the message only appears once in the stream. At least this prevents people from seeing the same thing many times at once in a stream. But it still doesn’t solve the problem of people seeing messages come by that they’ve seen already.

A better solution is to create a new consumption experience for keeping up with streams, where relevant messages are actually surfaced to users, instead of simply falling below the fold and getting buried forever. This would help to ensure that people would see the messages that were intended for them, and that they really wanted to see.

If this worked well enough, there would be less reason to do scheduled posting, let alone repeat posting. You could post a message once, and there would be much better chance of it being seen by your audience.

At Bottlenose, we’re working on exactly this issue in a number of ways. First of all, the app computes rich semantic metadata for messages in streams automatically, which makes it possible to filter them in many ways.

Bottlenose also computes the relevance of every message to every user, which enables ranking and sorting by relevancy, and the app provides smart automated assistants that can help to find and suggest relevant messages to users.

We’re only at the beginning of this and these features are still in early beta, but already we’re seeing significant productivity gains.

Fast-Moving Streams

As message volume increases exponentially in streams, our streams are going to not just going to be noisier, they are going to move faster. When we look at any stream there will be more updates per minute – more new messages scrolling in – and this will further reduce the chances of any message getting noticed.

Streams will begin to update so often they will literally move all the time. But how do you read, let alone keep up with, something that’s always moving?

Today, if you follow a Twitter stream for a breaking news story, such as a natural disaster like the Tsunami in Japan, or the death of Steve Jobs, you can see messages scrolling in, in real-time every second.

In fact, when Steve Jobs died, Twitter hit a record peak of around 50,000 Tweets per minute. If you were following that topic on Twitter at that time, the number of new messages pouring was impossible to keep up with.

Twitter has put together a nice infographic showing the highest Tweets Per Second events of 2011.

During such breaking news events, if you are looking at a stream for the topic, there is not even time to read a message before it has scrolled below the fold and been replaced by a bunch of more recent messages. The stream moves too fast to even read it.

But this doesn’t just happen during breaking news events. If you simply follow a lot of people and news sources, you will see that you start getting a lot of new messages every few minutes.

In fact, the more people and news sources, saved searches, and lists that you follow, the higher the chances are that at any given moment there are going to be many new messages for you.

Even if you just follow a few hundred people, the chances are pretty high that you are getting a number of new messages in Twitter and Facebook every minute. That’s way more messages than you get in email.

And even if you don’t follow a lot of people and news sources – even if you diligently prune your network, unfollow people, and screen out streams you don’t want, the mere exponential growth of message volume in coming years is soon going to catch up with you. Your streams are going to start moving faster.

But are there any ways to make it easier to keep up with these “whitewater streams?”

Scrolling is Not the Answer

One option is to just make people scroll. Since the 1990’s UX designers have been debating the issue of scrolling. Scrolling works, but it doesn’t work well when the scrolling is endless, or nearly endless. The longer the page, the lower percentage of users will scroll all the way down.

This becomes especially problematic if users are asked to scroll in long pages – for example infinite streams of messages going back from the present to the past (like Twitter, above). The more messages in the stream, the less attention those messages that are lower in the stream, below the fold, will get.

But that’s just the beginning of the problem. When a stream is not only long, but it’s also moving and changing all the time, it becomes much less productive to scroll. As you scroll down new stuff is coming in above you, so then you have to scroll up again, and then down again. It’s very confusing.

In long streams that are also changing constantly it is likely that engagement statistics will be very different than for scrolling down static pages. I think it’s likely engagement will be much lower, the farther down such dynamic streams one goes.

Pausing the Scroll is Not the Answer

Some apps handle this problem of streams moving out from under you by pausing auto-scrolling as you read – they simply notify you that there are new messages above whatever you are looking at. You can then click to expand the stream above and see the new messages. Effectively they make dynamic streams behave as if they are not dynamic, until you are ready to see the updates.

This at least enables you to read without the stream moving out from under you. It’s less disorienting that way. But in fast moving streams where there are constantly new updates coming in, you have to click on the “new posts above” notification frequently, and it gets tedious.

For example, here is Twitter, on a search for Instagram, a while after the news of their acquisition by Facebook. After waiting only a few seconds, there are 20 new tweets already. If you click the bar that says “20 new Tweets” they expand. But by the time you’ve done that and started reading them, there are 20 more.

 

Simply clicking to read “20 new tweets” again and again is tedious. And furthermore, it doesn’t really help users cope with the overwhelming number of messages and change in busy streams.

The problem here is that streams are starting to move faster than we can read, even faster than we can click. How do you keep up with this kind of change?

Tickers and Slideshows Are Helpful

Another possible solution to the problem of keeping up with moving streams is to make the streams become like news tickers, constantly updating and crawling by as new stuff comes in. Instead of trying to hide the movement of the stream, make it into a feature.

Some friends and I have tested this idea out in an iPad app we built for this purpose called StreamGlider. You can download StreamGlider and try it out for yourself.

StreamGlider shows streams in several different ways — including a ticker mode and a slideshow mode where streams advance on their own as new messages arrive.

 

The Power of Visualization

Another approach to keeping up with fast moving streams is to use visualization, like we’re doing in Bottlenose, with our Sonar feature. By visualizing what is going on in a stream you can provide a user with instant understanding of what is in the stream and what is important and potentially interesting to them, without requiring them to scroll, skim or read everything first.

Sonar reads all the messages in any stream, applies natural language and semantic analysis to them, detects and measures emerging topics, and then visualizes them in realtime as the stream changes.

It shows you what is going on in the stream – in that pile of messages you don’t have time to scroll through and read. As more messages come in, Sonar updates in realtime to show you what’s new.

You can click on any trend in Sonar that interests you, to quickly zoom into just the messages that relate.

The beauty of this approach is that it avoids scrolling until you absolutely want to. Instead of scrolling, or even skimming the messages in a stream, you just look at Sonar and see if there are any trends you care about. If there are, you click to zoom in and see only those messages. It’s extremely effective and productive.

Sonar is just one of many visualizations that could help with keeping up with change in huge streams. But it’s also only one piece of the solution. Another key piece of the solution is finding things in streams.

Finding Things in Streams

Above, we discussed problems and solutions related to keeping up with streams that are full of noise and constantly changing. Now let’s discuss another set of problems and solutions related to finding things in streams.

Filtering the Stream

For a visualization like Sonar to be effective, you need the ability to filter the stream for the sources and messages you want, so there isn’t too much noise in the visualization. The ability to filter the stream for just those subsets of messages you actually care about is going to be absolutely essential in coming years.

Streams are going to become increasingly filled with noise. But another way to think about noisy streams is that they are really just lots of less-noisy streams multiplexed together.

What we need is a way to intelligently and automatically de-multiplex them back into their component sub-streams.

For example, take the stream of all the messages you receive from Twitter and Facebook combined. That’s probably a pretty noisy stream. It’s hard to read, hard to keep up with, and quickly becomes a drag.

In Bottlenose you can automatically de-multiplex your streams into a bunch of sub-streams that are easier to manage. You can then read these, or view them via Sonar, to see what’s going on at a glance.

For example, you can instantly create sub-streams – which are really just filters on your  stream of everything. You might make one for just messages by people you like, another for just messages by influencers, another for just news articles related to your interests, another for just messages that are trending, another of just photos and videos posted by your friends, etc.

The ability to filter streams – to mix them and then unmix them – is going to be an essential tool for working with streams.

Searching the Stream

In the first article in this series we saw how online attention and traffic is shifting from search to social. Social streams are quickly becoming key drivers for how content on the Web is found. But how are things found in social streams? It turns out existing search engines, like Google, are not well-suited for searching in streams.

Existing algorithms for Web search do not work well for Streams. For example, consider Google’s PageRank algorithm.

In order to rank the relevancy of Web pages, PageRank needs a very rich link structure. It needs a Web of pages with lots of links between the documents. The link structure is used to determine which pages are the best for various topics. Effectively links are like votes – when pages about a topic link to other pages about that topic, they are effectively voting for or endorsing those pages.

While PageRank may be ideal for figuring out what Web pages are best, it doesn’t help much for searching messages, because messages may have no links at all, or may be only very sparsely linked together. There isn’t enough data in individual messages to figure out much about them.

So how do you know if a given message is important? How do you figure out what messages in a stream actually matter?

When searching the stream, instead of finding everything, we need to NOT find the stuff we don’t want. We need to filter out the noise. And that requires new approaches to search. We’ve already discussed filtering above and the ability to filter streams is a per-requisite for searching them intelligently. Beyond that, you need to be able to measure what is going on within streams, in order to detect emerging trends and influence.

The approach we’re taking in Bottlenose to solve this is a set of algorithms we call “StreamRank.” In StreamRank we analyze the series of messages in a stream to figure out what topics, people, links and messages are trending over time.

We also analyze the reputations or influence of message authors, and the amount of response (such as retweets or replies or likes) that messages receive.

In addition, we also measure the relevance of messages and their authors to the user, based on what we know of the user’s interest graph and social graph.

This knowledge enables us to rank messages in a number of ways: by date, by popularity, by relevance, by influence, and by activity.

Another issue that comes up when searching the Stream is that many messages in streams are quite strange looking – they don’t look like properly formed sentences or paragraphs. They don’t look like English, for example. They contain all sorts of abbreviations, hashtags, @replies, and short URLs, and they often lack punctuation and are scrunched to fit in 140 character Twitter messages.

Search algorithms that use any kind of linguistics, disambiguation, natural language processing, or semantics, don’t work well out of the box on these messy messages.

To apply such techniques you need to rewrite them so that they work on short, messy, strange looking messages. This is also something we’ve built in Bottlenose — we’ve built a new natural language processing and topic detection engine in Javascript that is designed specifically to handle these types of streams and messages.

These are some of the new challenges and solutions we’re applying in Bottlenose to make working with streams more productive. They are components of what we call our “StreamOS,” a new high-level Javascript and HTML5 operating system for applications that need to do smart things with streams. We’ll be writing a lot more about this in future articles.

 

Drowning in the Stream — New Challenges for a New Web

This is Part II of a three-part series of articles on how the Stream is changing the Web.

In Part I of this series, The Message is the Medium, I wrote about some of the shifts that are taking place as the center of online attention shifts from documents to messages.

Here in Part II, we will explore some of the deeper problems that this shift is bringing about.

New Challenges in the Era of the Stream

Today the Stream has truly arrived. The Stream is becoming primary and the Web is becoming secondary. And with this shift, we face tremendous new challenges, particularly around overload. I wrote about some of these problems for Mashable in an article called, “Sharepocalypse Now.”

The Sharepocalypse is here. It’s just too easy to share, there is too much stuff being shared, there are more people sharing, and more redundant ways to share the same things. The result is that we are overloaded with messages coming at us from all sides.

For example, I receive around 13,000 messages/day via various channels, and I’m probably a pretty typical case. You can see a more detailed analysis here.

As the barrier to messaging has become lower and people have started sending more messages than ever before, messaging behavior has changed. What used to be considered spam is now considered to be quite acceptable.

Noise is Increasing

In the 1990’s emailing out a photo of the interesting taco you are having for lunch to everyone you know would have been considered highly spammy behavior. But today we call that “foodspotting” and we happily send out pictures of our latest culinary adventure on multiple different social networks at once.

Spam is the New Normal

It’s not just foodspotting – the same thing is happening with check-ins, and with the new behavior of “pinning” things (the new social bookmarking) that is taking place in Pinterest. Activities that used to be considered noise have somehow started to be thought of as signal. But in fact, for most people, they are still really noise.

The reason this is happening is that the barrier to sharing is much lower than it once was. Email messages took some thought to compose – they were at least a few paragraphs long. But today you can share things that are 140 characters or less, or just a photo without even any comments. It’s instant and requires no investment or thought.

Likewise, in the days of email you had to at least think, “is it appropriate to send this or will it be viewed as spam?” Today people don’t even have that thought anymore. Send everything to everyone all the time. Spam is the new normal.

Sharing is a good thing, but like any good thing, too much of it becomes a problem.

The solution is not to get people to think before sharing, or to share less, or to unfollow people, or to join social networks where you can only follow a few people (like Path or Pair), it’s to find a smarter way to deal with the overload that is being created.

Notifications Overload

Sharing is not the only problem we’re facing. There are many other activities that generate messages as well. For example, we’re getting increasing numbers of notifications messages from apps. These notifications are not the result of a person sharing something, they are the result of an app wanting to get our attention.

We’re getting many types of notifications, for example:

  • When people follow us
  • When we’re tagged in photos
  • When people want to be friends with us
  • When there are news articles that match our interests
  • When friends check-in to various places
  • When people are near us
  • When our flights are delayed
  • When our credit scores change
  • When things we ordered are shipped
  • When there are new features in apps we use
  • When issue tickets are filed or changed
  • When files are shared with us
  • When people mention or reply to us
  • When we have meeting invites, acceptances, cancellations, or meetings are about to start
  • When we have unread messages waiting for us in a social network

The last bullet bears an extra mention. I have noticed that LinkedIn for example, sends me these notifications about notifications. Yes, we are even getting notifications about notifications!

When you get messages telling you that you have messages, that’s when you really know the problem is getting out of hand.

Fragmented Attention

Another major problem that the Stream is bringing about is the fragmentation of attention.

Today email is not enough. If it wasn’t enough work that we each have several email inboxes to manage, we are now also getting increasing volumes of messages outside of email in entirely different inboxes for specialized apps. We have too many inboxes.

It used to be that to keep up with your messages all you needed was an email client.

Then the pendulum swung to the Web and it started to become a challenge to keep up with all the Web sites we needed to track every day.

So RSS was invented and for a brief period it seemed that the RSS reader would be adopted widely and solve the problem of keeping up with the Web.

But then social networks came out and they circumvented RSS, forcing users to keep up in social-network specific apps and inboxes.

So a new class of “social dashboard” apps (like Tweetdeck) were created to keep up with social networks, but they didn’t include email or RSS, or all the other Web apps and silos.

This trend towards fragmentation has continued – an increasing array of social apps and web apps can really only be adequately monitored in those same apps. You can’t really effectively keep up with them in email, in RSS, or via social networks. You have to login to those apps to get high-fidelity information about what is going on.

We’re juggling many different inboxes. These include email, SMS, voicemail, Twitter, Facebook, LinkedIn, Pinterest, Tumblr, Google+, YouTube, Yammer, Dropbox, Chatter, Google Reader, Flipboard, Pulse, Zite, as well as inboxes in specialized tools like Github, Uservoice, Salesforce, and many other apps and services.

Alan Lepofsky, at Constellation Research, created a somewhat sarcastic graph to illustrate this problem, in his article, “Are We Really Better Off Without Email?” The graph is qualitative – it’s not based on direct numbers – but in my opinion it is probably very close to the truth.

What this graph shows is that email usage peaked around 2005/2006, after which several new forms of messaging began to get traction. As these new apps grew, they displaced email for some kinds of messaging activities, but more importantly, they fragmented our messaging and thus our attention.

The takeaway from this graph is that we will all soon be wishing for the good old days of email overload. Email overload was nothing compared to what we’re facing now.

The Message Volume Explosion

As well as increasing noise and the fragmentation of the inbox, we’re also seeing huge increases in message volume.

Message volume per day, in all messaging channels, is growing. In some of these channels, such as social messaging, it is growing exponentially. For example, look at this graph of Twitter’s growth in message volume per day since 2009, from the Bottlenose blog:

Twitter now transmits 340 Million messages per day, which is more than double the number of messages per day in March of 2011.

If this trend continues then in a year there will be between 500 million and 800 million messages per day flowing through Twitter.

And that’s just Twitter – Facebook, Pinterest, LinkedIn, Google+, Tumblr, and many other streams are also growing. And email messages are also increasing as well, thanks to all the notifications that are being sent to email by various apps.

Message volume is growing across all channels. This is going to have several repercussions for all of us.

Engagement is Threatened

First of all, the signal-to-noise ratio of social media, and other messaging channels, is going to become increasingly bad as volume increases. There’s going to be less signal and more noise. It is going to get harder to find the needles in the haystack that we want, because there is going to be so much more hay.

Today, on services like Twitter and Facebook, signal-to-noise is barely tolerable already. But as this situation gets worse in the next two years, we are going to become increasingly frustrated. And when this happens we are going to stop engaging.

When signal-to-noise in a channel gets too out of hand, it becomes unproductive and inefficient to use that channel. In the case of social media, we are right on cusp of this happening. And when this happens, people will simply stop engaging. And when engagement falls the entire premise of social media will start to fail.

This is already starting to happen. One recent article by George Colony, CEO of analyst firm, Forrester Research, cites a recent study that found that 56% of time spent on social media is wasted.

When you start hearing numbers like this, it means that consumers are not getting the signal they need most of the time, and this will inevitably result in a decrease in satisfaction and engagement.

What’s Next?

We have seen some of the issues that are coming about, or may soon come about, as the Stream continues to grow. But what’s going to happen next? How is the Stream, and our tools for interacting with it, going to adapt?

Click here to read Part III of this series, Keeping Up With the Stream, where we’ll explore various approaches do solving these problems.

The Message is the Medium – Attention is Shifting from the Web to the Stream

Shift Happens

A major shift has taken place on the Web. Web pages and Web search are no longer the center of online activity and attention. Instead, the new center of attention is messaging and streams. We have moved from the era of the Web to the era of the Stream. This changes everything.

Back in 2009, I wrote an article called “Welcome to the Stream – Next Phase of the Web” which discussed the early signs of this shift. Around the same time, Erick Schonfeld, at TechCrunch, also used the term in his article, “Jump Into the Stream.” Many others undoubtedly were thinking the same thing: The Stream would be the next evolution of the Web.

What we predicted has come to pass, and now we’re in this new landscape of the Stream, facing new challenges and opportunities that we’re only beginning to understand.

In this series of articles I’m going to explore some of the implications of this shift to the Stream, and where I think this trend is going. Along the way we’re going to dive deep into some major sea changes, emerging problems, and new solutions.

From Documents to Messages

The shift to the Stream is the latest step in a cycle that seems to repeat. Online attention appears to swing like a pendulum from documents to messages and back every few decades.

Before the advent of the Web, the pendulum was swinging towards messaging. The center of online attention was messaging via email, chat and threaded discussions. People spent most of their online time doing things with messages. Secondarily, they spent time in documents, for example doing word-processing.

Then the Web was born and the pendulum swung rapidly from messages to documents. All of a sudden Web pages – documents – became more important than messages. During this period the Web browser became more important than the email client.

But with the growth of social media, the pendulum is swinging back from documents to messaging again.

Today, the focus of our online attention is increasingly focused towards messages, not Web pages. We are getting more messages, and more types of messages, from more apps and relationships, than ever before.

We’re not only getting social messages, we’re getting notifications messages. And they are coming to us from more places – especially from social networks, content providers, and social apps of all kinds.

More importantly, messages are now our starting points for the Web — we are discovering things on the Web from messages. When we visit Web pages, it’s more often a result of us finding some link via a message that was sent to us, or shared with us. The messages are where we begin, they are primary, and Web pages are secondary.

From Search to Social

Another sign of the shift from the Web to the Stream is that consumers are spending more time in social sites like Facebook, Pinterest and Twitter than on search engines or content sites.

In December of 2011, Comscore reported that social networking ranked as the most popular content category in online engagement, accounting for 19% of all consumer time spent online.

These trends have led some such as VC, Fred Wilson, to ask, “how long until social drives more traffic than search?”  Fred’s observation was that his own blog was getting more traffic from social media sites than from Google.

Ben Elowitz, the CEO of Wetpaint, followed up on this by pointing out that according to several sources of metrics, the shift to social supplanting search as the primary traffic driver on the Web was well underway.

According to Ben’s analysis, the top 50 sites were getting almost as much traffic from Facebook as from Google by December of 2011. Seven of these top 50 sites were already getting 12% more visits from Facebook than from Google, up from 5 of these top sites just a month earlier.

The shift from search to social is just one of many signs that the era of the Stream has arrived and we are now in a different landscape than before.

The Web has changed, the focus is now on messages, not documents. This leads to many new challenges and opportunities. It’s almost as if we are in a new Web, starting from scratch – it’s 1994 all over again.

Click here to continue on to Part II of this series, Drowning in the Stream, where we’ll dig more deeply into some of the unique challenges of the Stream.

The Problem with Stream 3.0

After my former project, Twine.com, was sold, I began to turn my attention to the Next Big Challenge: How to make sense of the growing real-time Web, or what many call, “the Stream.”

I could see the writing on the wall, and it was less than 140 characters: Social media’s own success was going to be its biggest challenge. The Stream was going to soon become unusable.

In the early days of the Stream, it was actually possible to keep up with your community on Twitter and Facebook effectively. Not anymore. There are just too many people messaging too often. The chances of even seeing a message before it scrolls into history are getting lower every day.

Today, the Stream is growing exponentially. Twitter famously grew by 3x in the last year and sends out more than 250 million Tweets per day. Facebook sends billions of public and private messages per day. And this is just the tip of the iceberg — or the deluge, as it were.

There are so many new and growing sources of messages in the Stream: Google+, LinkedIn, Foursquare, Youtube, RSS feeds, and more are coming. And that’s just the consumer side of the Stream – there’s a whole other side to the Stream: Chatter, Yammer, Socialcast, Jive, and many other enterprise streams are also growing rapidly.

And on top of this there is a whole new deluge of machine and app-generated data that is just starting to join the stream, and may eventually dwarf human-generated data.

At the same time as all these new networks are popping up to enable messaging in the Stream, the barrier to creating and sharing messages has also never been lower. I call this The Sharepocalypse.

It’s never been easier to share — People are sharing more kinds of information, more often, with more people, than ever before. And it’s requiring less thought too — because the messages themselves are so short. This is resulting in a collective overshare of unimagined proportions.

With email, the messages were usually long and required some effort, so people sent relatively few emails per day. And at least with email there were some basic social rules about what you could send to everyone without being a spammer.

Not anymore. In the age of the Stream it’s quite normal to post out what you had for lunch, or some cool product you are looking at in a store window, with a photo, to the entire world. That would have been unthinkable in the email era. In the age of the Stream, it’s not even an afterthought. The Sharepocalypse is here, in spades.

The result of all this adoption and growth of the Stream is a new kind of information overload, stream overload.

Stream Overload is worse than email overload, because it includes email overload.

Email, in my opinion was “Stream 1.0.” Social media (RSS, Twitter, Facebook, etc.) were “Stream 2.0.” And now we’re entering “Stream 3.0” – when everything – all information, all applications, everyone, even things – become part of the Stream.

(Yes I know, version numbers are so Web 3.0, but it’s helpful to use them as handles for the discussion. Stream 3.0 is indeed a different era from the early days of the Stream.)

We’re already seeing the signs of stream overload — but this is just a preview of what’s to come as Stream 3.0 comes to maturity. The growth of the Stream is still only just beginning. Most of the planet isn’t using it yet. And most people don’t realize how integral it’s going to be in their lives in coming years.

If the Web is the planet’s brain, the Stream is its mind – it’s the living, breathing, thinking, learning, aware, acting part. And we’re all going to be part of it 24/7, whether we like it or not. So it better be good, it better be smart, it better be useable, or we’re all going to be gridlocked and buried in messages we don’t want.

And this is the Next Big Problem: The Stream is going to become both more important, and more noisy at the same time. This is a classic crisis. Either something must be done to reduce the noise, or it’s not going to be useable. And this will lead to problems, because it’s important that it actually is usable.

What happens if the Stream really breaks down under its own weight?

If the signal-to-noise problem isn’t solved, and people can’t keep up with the Stream, they’re going to give up. They’re going to stop paying attention. They’re going to stop trying to keep up. They will never be able to scroll down enough. They won’t even login to sites like Twitter and Facebook if they are too overloaded.

And if nobody is there listening, then there won’t be much point in posting news and updates to the Stream either. People will stop posting too.

And without the people there, marketers won’t post either – so the advertising money will go away. And even in the social enterprise, if streams for teams get too noisy, they will also stop being used and people will move to some new solution.

And without the people there, the Stream will become an automaton. All that will be left is machines posting to machines.

Unless something is done to solve it, of course.

And something IS being done, it turns out. We’re launching Bottlenose tonight. To read more about the history of the project, read Bottlenose has Launched!

 

 

Notes

 

Make sure to follow us on Twitter:

And come check out Bottlenose! The app is still in invite beta so you either have to have a high enough Klout score or an invite code to get in.

The first 500 readers of my blog who want to try it out, can get into Bottlenose using the invite code: novafriends

 

Check out the what the press is saying about Bottlenose:

Bottlenose Intelligent Social Dashboard Launches Private Beta  — ReadWriteWeb

Bottlenose is a Game Changer for Social Media Consumption — Mashable

Bottlenose is a Social Media Dashboard That Makes Sense of the Stream – Venturebeat

Can This Startup Eliminate Social Media Overload? — Inc.

The Day of the Dolphin: Swim in the Personalized Stream With Bottlenose — SemanticWeb

Bottlenose Launch – A Smarter Way to Skim the Stream – SiliconAngle

Bottlenose is a Web-Based Twitter Client for Power Users — AllThingsD

Managing the Sharepocalypse — AdWeek

Can Bottlenose Help Prevent the Social Sharepocalypse? — GigaOm

Social Overload? Bottlenose Promises Intelligent Filtering — Information Week

 

 

Bottlenose has Launched!

Today, after almost two years of work in stealth, I am proud to announce the launch of Bottlenose.

While I have co-founded and serve on the boards of several other ventures (The Daily Dot, Live Matrix, StreamGlider, and others), Bottlenose is different from all my other projects in that I am also in a full-time day-to-day role as the CEO. In short, Bottlenose is what I’m putting the bulk of my time into going forward, although I will continue to angel invest and advise other startups.

The story of Bottlenose began when my good friend and advisor, Josh Jones-Dilworth, introduced me to Dominiek ter Heide after I sold my last company, Twine.com in 2010.

Dominiek was at the time working on a new kind of personalization technology for social media. Meanwhile, I had been thinking about how to filter the Stream, and the emerging problem of the Sharepocalypse and what I have been calling “the Stream 3.0 Problem.”

Josh knew both of us and had a hunch that we were really thinking about the same problem from different angles. Dominiek and I started speaking via Skype and soon we teamed up. Bottlenose was officially born in 2010.

Working with Dominiek has been a true pleasure. He’s one of the most productive, talented, software engineers I’ve ever met. It’s been an amazing ride so far. Soon, thanks to Dominiek, we were joined by an A-team of killer engineers with expertise in natural language processing, Node.js, Javascript, HTML 5, machine learning, cloud computing, NoSQL, and more.

Our little band of hotshots has produced an amazingly robust and powerful app — something that even large companies with huge engineering teams would be hard-pressed to develop. I’m honored to be working with these guys, and very proud of the team and the what we’ve built.

We have also been fortunate to be joined by some terrific angel investors, including Andy Jenks, of Stage One Capital, and several others (see the About page on Bottlenose for the complete list).

So what is Bottlenose anyway? Well one way to find out is to visit the site and check out the Tour there. But I’ll summarize here as well:

Bottlenose is the smartest social media dashboard ever built. It’s designed for busy people who make heavy use of social media: prosumers, influencers, professionals.

Bottlenose uses next-generation “stream intelligence” technology to understand the messages that are flowing through Twitter, Facebook and other social networks. It also learns about your interests.

On the basis of this knowledge, Bottlenose helps you filter your streams to find what matters to you, what’s relevant, and what’s most important. Bottlenose also includes many new features, like Sonar, which visualizes what’s going on in any stream, and powerful rules and automation capabilities to help you become more productive.

This is just the beginning of this adventure. Our roadmap for Bottlenose is very ambitious, and it’s going to be a lot of fun, and hopefully will really make a difference too. We’re super excited about this product and we hope you will be as well.

Check back here for more posts and observations about Bottlenose and where I think social media is headed.

Make sure to follow us on Twitter:

And come check out Bottlenose! The app is still in invite beta so you either have to have a high enough Klout score or an invite code to get in.

The first 500 readers of my blog who want to try it out, can get into Bottlenose using the invite code: novafriends

I look forward to seeing you Bottlenose!

For more about the thinking behind Bottlenose, read The Problem of Stream 3.0

 

Announcing Common Crawl

Several years ago my friend Gil Elbaz (CEO of Factual; forefather of Google AdWords) approached me with an ambitious vision – he wanted to create an open not-for-profit crawl of the Web to ensure that everyone would have equal access to a Web-scale search index to build on and experiment with.

Search giants like Google and Microsoft were not likely to provide open access to their search indices because they couldn’t risk giving their crown jewels to potential competitors, and furthermore they were bound by the constraints of for-profit business models.

Gil felt that in the future it would be an important service to provide a truly open Web-scale search index that was not controlled by a for-profit company and was not bound by profit motives. This index would make it possible for startups to innovate in search, and for researchers and students to explore Web Science at scale, and furthermore it would level the playing field in search and distribute the index, preventing any one company from monopolizing the index of humanity’s knowledge.

As a longtime advocate of the open Web, I was excited by the vision Gil shared with me, and agreed to join the board of directors of what became The Common Crawl Foundation, along with Carl Malamud. Gil and lead engineer, Ahad Rana, then went to work actually building the thing. This was no small undertaking and required quite a bit of innovation and ingenuity. You can read about the cloud based solution that was developed here.

Several years later, after a lot of work, it’s starting to be ready for Prime Time, and so we’re happy to announce the Web’s first truly open, non-profit, 5 billion page search index!

With the recent addition of our director, Lisa Green, from Creative Commons, Common Crawl is now beginning a new phase in its rollout, and a new phase for the open Web. You can read our inaugural blog post announcing the project here.

We hope you will come in and take a look around, and we look forward to seeing what you dream up and build with this data set.

 

Bottlenose Begins to Unstealth

It’s been a busy week for the team at bottlenose one of my coolest venture productions.

Bottleno.se has developed a very powerful new personalization system that is optimized for making sense of Twitter and other real-time information streams. The product is in alpha and invite beta is planned for June.

It began when TechCrunch broke the story about the company, earlier this week.

That was followed by an interesting article by Marshall Kirkpatrick on the Twitter overload problem, and then a detailed article by Jenny Zaino about how bottleno.se hopes to solve that problem.

And, there was also a false rumor that bottleno.se might get bought soon which started spreading like wildfire online – but we’ve publicly stated that we not looking to sell at this early stage, whether or not there is interest.

ff you’re curious what all the buzz is about, sign up for the invite beta this summer. We’ll start letting folks into the beta on a rolling basis in June, in order of influence on the invite list, since the product is focused on influencers.

If you want to ensure that you get in early, you can show us your level of influence by getting other people to register for the beta with you, by tweeting or inviting friends via a special link we give you in the registration form. The more people who register via your links, the higher on our invite list you rise.

More news is coming soon, so follow @bottlenoseapp on Twitter, as well as @dominiek and @novaspivack (me) to keep up with us.

The Schedule of the Web: Live Matrix – Launched Tonight

Tonight I am pleased to announce that my next Big Idea has launched. It’s called Live Matrix and I invite you to come check it out.

Live Matrix is the schedule of the Web — We help you to find out “What’s When on the Web” — the hottest live online events happening on the Web: concerts, interviews, live chat sessions, game tournaments, sales, popular Webshows, tech conferences, live streaming sports coverage, and much more.

It’s like TV Guide was for TV, but it’s not for TV, it’s for the Web. There are all kinds of things happening online — and while Live Matrix includes a lot of live streaming video events, there is much more than just video in our guide. Live Matrix includes any types of scheduled online events — but we don’t include offline events — to be in Live Matrix an event must enable people to participate online.

The site combines elements of a guide, a search engine, and a DVR, to help you discover events and then get reminded to attend them, or catch them later if you missed them.

The insight that led to Live Matrix was that the time-dimension of the Web is perhaps the last big greenfield opportunity on the Web. It’s an entire dimension of the Web that nobody has made a search engine for, and nobody is providing any guidance for. Nobody owns it yet — it’s a whole new frontier of the Web.

There are millions of scheduled events taking place online every day. Some of these events are very cool, some are very relevant — but there is no easy way to find out about them. To find out what’s happening when on TV for example, we have TV Guide, but there is no equivalent for finding out what’s happening when on the Web.

In my own case I kept finding out about cool online events that I would have participated in — concerts, conference streams, webinars, online debates and interviews, and sales —  if only I had known they were happening. I think many Internet users have experienced this.

Google, Yahoo and Bing all focus on what I call the “space dimension” of the Web — they help you find what’s where — where is the best page about topic x? — But they don’t help you find out what’s when — what’s happening now, what’s coming next. They only help you find out what’s already finished and done with. How do you find out what’s happening now? How do you know what’s upcoming?

It was an “aha moment” when this all became clear — there is a new opportunity to be the Google or Yahoo for the time dimension of the Web. Or at least to be the equivalent of a TV Guide for the Web.

Furthermore, All trends point to this being a big opportunity. The continued growth of the realtime Web (Twitter, etc.) and the emerging Live Web (video and audio streaming) has been discussed extensively in the media; most recently comScore reported nearly a 650% increase in time spent viewing live video online.

So with this opportunity clearly in mind I set about looking for a co-founder who would be the right person to team up with, someone who would be the CEO.

That person was Sanjay Reddy. Soon after I met Sanjay it was clear to me that he was the exact right guy to partner with: his background in media and technology were what impressed me (for example, he was head of corp dev, strategy and M&A at Gemstar-TV Guide, where he led the $2.3 billion dollar sale of the company to Macrovision, and he had also worked at other Silicon Valley startups and investment banks as well).

Sanjay and I spent quite a bit of time just talking about ideas and eventually decided to join forces. My Lucid Ventures incubator, along with Sanjay, seed-funded the new venture and named it Live Matrix, to go after our mutual vision.

Soon after Sanjay joined we were fortunate to be joined by our two highly experienced colleagues, Edgar Fereira (formerly VP of data for TV Guide Data and TV Guide Online) and Tobias Batton (serial entrepreneur, product manager, game designer). Then others joined around us.

Eventually we formed a small (but awesome) startup team and began working on a prototype and eventually an alpha. We debuted a closed beta preview at TechCrunch Disrupt last spring and received enthusiastic reviews. Now, today, we are releasing our public beta.

Read the full press release here.

I hope you like what we’ve created so far. But please note it is still a BETA. We are interested in your feedback and we already have a lot of feedback from our private beta. Here are some of the ideas we are working on for our next few releases:

  • The Number One request we have received so far is to make it easier and faster for people to find events that would interest them. So for the remainder of the year one of our big priorities will be to add in more personalization and recommendations.
  • We’re also working on new UI concepts, including some more ways to view the schedule of the Web.
  • And we’re going to make it easier and faster for you to add events to Live Matrix — we’ll be launching improvements to our publisher tools section, as well more ways for people to suggest events for us to list.
  • And we also plan to add new categories of events — for examples, Business, Technology, Games, and more.

So stay tuned! Live Matrix is just getting started. But this could be the start of something big.

ps. Here’s a screencast with a quick tour of Live Matrix

Live Matrix Demo from Doug Freeman on Vimeo.

Web 3.0 Documentary by Kate Ray – I'm interviewed

Kate Ray has done a terrific job illustrating and explaining Web 3.0 and the Semantic Web in her new documentary. She interviews, Tim Berners-Lee, Clay Shirky, me, and many others. If you’re interested in where the Web is headed, and the challenges and opportunities ahead, then you should watch this, and share it too!

Is Live Content More Valuable than On-Demand Content?

I have started blogging about a new concept that I call The Scheduled Web. The Scheduled Web is the next evolution of the Real-Time Web, in which it will become possible to actually navigate the time dimension of the Web more productively.

There is a popular misconception that on-demand content, such as archived video, is more valuable than live content. But in fact, this may not be the case.

Live content has built-in perishability that makes it potentially more valuable than on-demand content – if relevant audiences can find it while it is live. If a piece of high-demand content is only live for a short period of time it can attract more traffic in less time, provided that people who would want to participate interactively (or even transactively) in it are notified beforehand.

More demand in less time translates to higher advertising revenues, or higher prices in time-based sales like auctions. A series of high-demand live events could actually earn more revenues than a series of on-demand content releases in any given unit of time.

A live event is only live for some limited period of time, after which even though it may later be available in archived form, the event is finished, it is no longer a live event. If you want to get the live experience and be able to actually participate in a live event, you have to be there. It isn’t the same to watch it after the fact. And in some cases, for example auctions, sales, games, contests and chats, if you miss the event you can’t participate and may not even be able to access an archived version (if you even wanted to).

Live events are the best of both worlds for several reasons:

1. They have extra perishability because they are live, giving people a stronger incentive to participate synchronously when they are actually happening. Furthermore, if a live event is also interactive in some way, it is even more valuable to those who are present. A good example of this is American Idol, where for instance, the audience can participate in the voting process that selects finalists. Interactivity makes the show more engaging and gives viewers a sense of ownership and personal investment in the content.

2. Live events can also be archived and made available on-demand, as well. The key to getting this double-layer of value out of live events is to schedule them so that they can be found before or while they are actually live. This amplifies the initial demand and attendance to the event, and also provides any archived version that follows an added social virality.

At Live Matrix we believe it is incorrect to assume that the television model carries over directly to the Web. The Web is an entirely different medium because it is two-way, interactive, both synchronous and asynchronous, and distribution is open to anyone and portable across any device. Television over the Web is going to be different than TV on cable and satellite networks. The fact that consumers can consume Web video content asynchronously is a plus, but it doesn’t obviate the need or opportunity for live synchronous content on the Web. In fact, for any event that requires or even wants to leverage interactivity, live synchronous attendance by audience members is a key part of the experience.

There are many use-cases where live synchronous content consumption cannot be replaced by asynchronous content consumption — for example a live chat, or a time-limited sale or auction, or a multiplayer live game. Even in the case of video and audio there are many cases where live synchronous content is more valuable than asynchronous on-demand content. For example who wants to watch the Superbowl months after the game is over? Who really wants to watch a major presidential address or a press conference weeks later? Who wants to watch video of election coverage months after it’s decided? These kinds of “timely” events are live by their nature, and part of the value of consuming the content is the act of doing it in a timely manner.

The value of live interactive content begins to become even more clear as on-demand content that is originally streamed live has the ability to generate more revenues over its lifetime than simply recorded, on-demand content alone. The Scheduled Web will thus even improve traffic and revenues for on-demand content, if that content can be initiated as live events, or at least paired with them in some way.

The value of the Scheduled Web will be realized as not simply a schedule of video content, but of all scheduled events of any type that take place on the Internet. While much of this content is valuable both when it initially goes live and on an ongoing basis as on-demand content after the fact, there is also a lot of content in Live Matrix that will be inherently and necessarily more valuable when it is live, such as sales and auctions or games.

In addition there is a new category of “exclusively live” online events that we may see emerge in 2011. These events will be one-time events, with no archived copies after they finish. They may be high-profile events where attendance requires paid admission for example. They will be marketed as special experiences – where not only do you have to be there to experience them, but where being there has special advantages, like being able to interact with others who are there and perhaps with the performers or celebrities involved as well. Some events may also offer backstage passes, or special break-out sessions as well.

For events like these — where the only value created is during the event’s live run — discovery must happen prior to or during the event for participation to take place. For these, the Scheduled Web is absolutely essential.

The Birth of the Scheduled Web

If 2010 was the year of the Real-Time Web, then 2011 is going to be the year that it evolves into the Scheduled Web.

The Real-Time Web happens in the now: it is spontaneous, overwhelming, and disorganized. Things just happen unpredictably and nobody really knows what to expect or what will happen when.

The Real-Time Web is something of a misnomer, however, because usually it’s not real-time at all –  it’s after-the-fact. Most people find out about things that happened on the Real-Time Web after they happen, or, if they are lucky, when they happen. There is no way to know what is going to happen before it happens; there is no way to prepare or ensure that you will be online when something happens on the Real-Time Web. It’s entirely hit-or-miss.

If we are going to truly realize the Real-Time Web vision, then “time” needs to be the primary focus. So far, the Real-Time Web has mainly just been about simultaneity and speed – for example how quickly people on Twitter can respond to an event in the real world such as the Haiti Earthquake or the Oscars.

This obsession with the present is a sign of the times, but it is also a form of collective myopia — the Real-Time Web really doesn’t include the past or the future – it exists in a kind of perpetual now. To put the “time” into Real-Time, we need to  provide a way to see the past, present and the future Real-Time Web at once.  For example, we need a way to search and browse the past, present, and the future of a stream – what happened, what is happening, and what is scheduled to happen in the future. And this is where what I am calling The Scheduled Web comes in. It’s the next step for the Real-Time Web.

Defining the Scheduled Web

With the Scheduled Web things will start to make sense again. There will be a return of some semblance of order thanks to schedule metadata that enables people (and software) to find out about upcoming things on the Web that matter to them, before they happen, and to find out about past things that matter, after they happen.

The Scheduled Web is a Web that has a schedule, or many schedules, which exist in some commonly accessible, open format. These schedules should be searchable, linkable, shareable, interactive, collaborative, and discoverable. And they should be able to apply to anything — not just video, but any kind of content or activity online.

Why is this needed? Well consider this example. Imagine if there was no TV Guide on digital television. How would you navigate the constantly changing programming of more than 1000 digital TV channels without an interactive program guide (IPG)? It would be extremely difficult to find shows in a timely manner. According to clickstream data from television set-top boxes, about 10% of all time spent watching TV is spent in the IPG environment. And that is not even counting additional time-spent in on-demand guidance interfaces on DVRs. The point here is that guidance is key when you have lots of streams of content happening over time.

Now extend this same problem to the Web where there are literally millions of things happening every minute. These streams of content are not just limited to video. There are myriad types of real-time streams, everything from sales, auctions, and chats, to product launches, games, and audio, to streams of RSS feeds, Web pages appearing on Web sites, photos appearing on photo sites, software releases, announcements, etc.

Without some kind of guidance it is simply impossible to navigate the firehose of live online content streams on the Web efficiently. This firehose is too much to cope with in the present moment, let alone the past, or the future. This is what the Scheduled Web will solve.

By giving people a way to see into the past, present and future of the Real-Time Web, the Scheduled Web will enable the REAL Real-Time Web to be truly actualized. People will be able to know and plan in advance to actually be online when live events they care about take place.

Instead of missing that cool live Web concert or that auction for your favorite brand of shoes, simply because you didn’t know about it beforehand, you will be able to discover it in advance, RSVP, and get reminded before it starts — so you can be there and participate in the experience, right as it happens.

We are just beginning to see the emergence of the Scheduled Web. Two new examples of startups that are at work in the space are Clicker and Live Matrix.

  • Clicker, a site that mainly provides on-demand video clips of past TV episodes, this week launched a schedule for live video streams on the Web.
  • Live Matrix (my new startup), is soon to launch a schedule for all types of online events, not just video streams.

Some people have compared Live Matrix to Clicker, however this is not a wholly accurate comparison. We have very different, although  intersecting, goals.

While Clicker is an interesting play to compete with TV Guide and companies like Hulu, Live Matrix is creating a broader index of all the events taking place across the Scheduled Web, not just video/TV content events.

The insight behind Live Matrix is that there is much more to the Scheduled Web than video and TV content. The Web is not just about TV or video – it is about many different kinds of content.

Applying a TV metaphor to the Web is like trying to apply a print metaphor to tablet computing. While print has many positive qualities, tablet devices should not be limited just to text should they? Likewise, while the TV metaphor has advantages, it doesn’t make sense to limit the experience of time or scheduled content on the Web just to video.

With this in mind, while Live Matrix includes scheduled live video streams, we view video and TV type content as just one of many different types of scheduled Web content that matter.

For example, Live Matrix also includes online shopping events like sales and auctions, which comprise an enormous segment of the Scheduled Web. As an illustration eBay alone lists around 10 million scheduled auctions and sales each day! Live Matrix also includes scheduling metadata for many other kinds of content — online games, online chats, online audio, and more.

Live Matrix is building something quite a bit broader than current narrow conceptions of the Real-Time Web, or the narrow metaphor of TV on the Web. We are creating a way to navigate and search the full time dimension of the Web, we are building the schedule of the Web.

This will become a valuable, even essential, layer of metadata that just about every application, service and Internet surfer will make use of every day. Because after all, life happens in time and so does the Web. By adding metadata about time to the Web, Live Matrix will help make the Web – and particularly the Real-Time Web – easier to navigate.

Online vs. Offline Events

One of the key rules of Live Matrix is that, to be included in our schedule, an event must be consumable on-line. This means that it must be possible to access and participate in an event on an Internet-connected device.

Live Matrix is not a schedule of offline events or events that cannot be consumed or participated in using Internet-connected devices.

We made this rule because we believe that in the near-future almost everything interesting will, in fact, be consumable online, even if it has an offline component to it. We want to focus attention on those events which can be consumed on Internet-connected devices, so that if you have a connected device you can know that everything in Live Matrix can be accessed directly on your device. You don’t have to get in your car and drive to some physical venue, you don’t have to leave the Internet and go to some other device and network (like a TV and cable network).

Note the shift in emphasis here: We believe that the center of an increasing number of events is going to be online, and the offline world is going to increasingly become more peripheral.

For example, if a retail sale generates more revenues from online purchases than physical in-store purchases, the center of the sale is really on-line and the physical store becomes peripheral. Similarly, if a live concert has 30,000 audience members in a physical stadium but 10,000,000 people attending it online, the bulk of the concert is in fact online. This is already starting to happen.

For example, the recent Youtube concert featuring U2 had 10 million live streams – that’s up to 10 million live people in the audience at one time, making it possibly the largest online concert in history; it’s certainly a lot more people than any physical stadium could accommodate. Similarly, online venues like Second Life and World of Warcraft can accommodate thousands of players interacting in the same virtual spaces – not only do these spaces not even have a physical analogue (they exist only in virtual space), but there are no physical spaces that could accommodate such large games. These are examples of how online events may start to eclipse offline events.

I’m not saying this trend is good or bad; I’m simply stating a fact of our changing participatory culture. The world is going increasingly online and with this shift the center of our lives is going increasingly online, as well. It is this insight that gave my co-founder, Sanjay Reddy, and I, the inspiration to start Live Matrix, and to begin building what we hope will be the backbone of the Scheduled Web.

Evri Ties the Knot with Twine — Twine CEO Comments and Analysis

Today I am announcing that my company, Radar Networks, and its flagship product, Twine, have been acquired by Evri. TechCrunch broke the story here.

This acquisition consolidates two leading providers of semantic discovery and search. It is also the culmination of a long and challenging venture to pioneer the adoption of the consumer Semantic Web.

As the CEO and founder of Radar Networks and Twine.com, it is difficult to describe what it feels like to have reached this milestone during what has been a tumultuous period of global recession. I am very proud of my loyal and dedicated team and the incredible work and accomplishments that we have made together, and I am grateful for the unflagging support of our investors, and the huge community of Twine users and supporters.

Selling Twine.com was not something we had planned on doing at this time, but given the economy and the fact that Twine.com is a long-term project that will require significant ongoing investment and work to reach our goals, it is the best decision for the business and our shareholders.

While we received several offers for the company, and were in discussions about M&A with multiple industry leading companies in media, search and social software, we eventually selected Evri.

The Twine team is joining Evri to continue our work there. The Evri team has assured me that Twine.com’s data and users are safe and sound and will be transitioned into the Evri.com service over time, in a manner that protects privacy and data, and is minimally disruptive. I believe they will handle this with care and respect for the Twine community.

It is always an emotional experience to sell a company. Building Twine.com has been a long, intense, challenging, rewarding, and all-consuming effort. There were incredible high points and some very deep lows along the way. But most of all, it has been an adventure I will never forget. I was fortunate to help pioneer a major new technology — the Semantic Web — with an amazing team, including many good friends. Bringing something as big, as ambitious, and as risky as Twine.com to market was exhilarating.

Twine has been one of the great learning experiences of my life. I am profoundly grateful to everyone I’ve worked with, and especially to those who supported us financially and personally with their moral support, ideas and advocacy.

I am also grateful to unsung heroes behind the project — the families of all of us who worked on it, who never failed to be supportive as we worked days, nights, weekends and vacations to bring Twine to market.

What I’m Doing Next

I will advise Evri through the transition, but will not be working full-time there. Instead, I will be turning my primary focus to several new projects, including some exciting new ventures:

  • Live Matrix, a new venture focusing on making the live Web more navigable. Live Matrix is led by Sanjay Reddy (CEO of Live Matrix; formerly SVP of Corp Dev for Gemstar TV Guide). Live Matrix is going to give the Web a new dimension: time. More news about this soon.
  • Klout, the leading provider of social analytics about influencers on Twitter and Facebook (which I was the first angel investor in, and which I now advise). Klout is a really hot  company and it’s growing fast.
  • I’m experimenting with a new way to grow ventures. It’s part incubator, part fund, part production company. I call it a Venture Production Studio. Through this initiative my partners and I are planning to produce a number of original startups, and selected outside startups as well. There is a huge gap in the early-stage arena, and to fill this we need to modify the economics and model of early stage venture investing.
  • I’m looking forward to working more on my non-profit interests, particularly those related to supporting democracy and human rights around the world, and one of my particular interests, Tibetan cultural preservation.
  • And last but not least, I’m getting married later this month, which may turn out to be my best project of all.

If you want to keep up with what I am thinking about and working on, you should follow me on Twitter at @novaspivack, and also keep up with my blog here at novaspivack.com and my mailing list (accessible in the upper right hand corner of this page).

The Story Behind the Story

In making this transition, it seems appropriate to tell the Twine.com story. This will provide some insight into how we got here, including some of our triumphs, and our mistakes, and some of the difficulties we faced along the way. Hopefully this will shed some light on the story behind the story, and may even be useful to other entrepreneurs out there in what is perhaps one of the most difficult venture capital and startup environments in history.

(Note: You may also be interested in viewing this presentation, “A Yarn About Twine” which covers the full history of the project with lots of pictures of various iterations of our work from the early semantic desktop app to Twine, to T2.)

The Early Years of the Project

The ideas that led to Twine were born in the 1990’s from my work as a co-founder of EarthWeb (which today continues as Dice.com), where among many things we prototyped a number of new knowledge-sharing and social networking tools, along with our primary work developing large Web portals and communities for customers, and eventually our own communities for IT professionals. My time with EarthWeb really helped me to understand that challenges and potential of sharing and growing knowledge socially on the Web. I became passionately interested in finding new ways to network people’s minds together, to solve information overload, and to enable the evolution of a future “global brain.”

After EarthWeb’s IPO I worked with SRI and Sarnoff to build their business incubator, nVention, and then eventually started my own incubator, Lucid Ventures, through which I co-founded Radar Networks with Kristin Thorisson, from the MIT Media Lab, and Jim Wissner (the continuing Chief Architect of Twine) in 2003. Our first implementation was a peer-to-peer Java-based knowledge sharing app called “Personal Radar.”

Personal Radar was a very cool app — it organized all the information on the desktop in a single semantic information space that was like an “iTunes for information” and then made it easy to share and annotate knowledge with others in a collaborative manner. There were some similarities to apps like Ray Ozzie’s Groove and the MIT Haystack project, but Personal Radar was built for consumers, entirely with Java, RDF, OWL and the standards of the emerging Semantic Web. You can see some screenshots pictures of this early work in this slideshow, here.

But due to the collapse of the first Internet bubble there was simply no venture funding available at the time and so instead, we ended up working as subcontractors on the DARPA CALO project at SRI. This kept our research alive through the downturn and also introduced us to a true Who’s Who of AI and Semantic Web gurus who worked on the CALO project. We eventually helped SRI build OpenIRIS, a personal semantic desktop application, which had many similarities to Personal Radar. All of our work for CALO was open-sourced under the LGPL license.

Becoming a Venture-Funded Company

Deborah L. McGuinness, who was one of the co-designers of the OWL language (the Web Ontology Language, one of the foundations of the Semantic Web standards at the W3C), became one of our science advisers and kindly introduced us to Paul Allen, who invited us to present our work to his team at Vulcan Capital. The rest is history. Paul Allen and Ron Conway led an angel round to seed-fund us and we moved out of consulting to DARPA and began work on developing our own products and services.

Our long-term plan was to create a major online portal powered by the Semantic Web that would provide a new generation of Web-scale semantic search and discovery features to consumers. But for this to happen, first we had to build our own Web-scale commercial semantic applications platform, because there was no platform available at that time that could meet the requirements we had. In the process of building our platform numerous technical challenges had to be overcome.

At the time (the early 2000’s) there were few development tools in existence for creating ontologies or semantic applications, and in addition there were no commercial-quality databases capable of delivering high-performance Web-scale storage and retrieval of RDF triples. So we had to develop our own development tools, our own semantic applications framework, and our own federated high-performance semantic datastore.

This turned out to be a nearly endless amount of work. However we were fortunate to have Jim Wissner as our lead technical architect and chief scientist. Under his guidance we went through several iterations and numerous technical breakthroughs, eventually developing the most powerful and developer-friendly semantic applications platform in the world. This led to the  development of a portfolio of intellectual property that provides fundamental DNA for the Semantic Web.

During this process we raised a Series A round led by Vulcan Capital and Leapfrog Ventures, and our team was joined by interface designer and product management expert, Chris Jones (now leading strategy at HotStudio, a boutique design and user-experience firm in San Francisco). Under Chris’ guidance we developed Twine.com, our first application built on our semantic platform.

The mission of Twine.com was to help people keep up with their interests more efficiently, using the Semantic Web. The basic idea was that you could add content to Twine (most commonly by bookmarking it into the site, but also by authoring directly into it), and then Twine would use natural language processing and analysis, statistical methods, and graph and social network analysis, to automatically store, organize, link and semantically tag the content into various topical areas.

These topics could easily be followed by other users who wanted to keep up with specific types of content or interests. So basically you could author or add stuff to Twine and it would then do the work of making sense of it, organizing it, and helping you share it with others who were interested. The data was stored semantically and connected to ontologies, so that it could then be searched and reused in new ways.

With the help of Lew Tucker, Sonja Erickson and Candice Nobles, as well as an amazing team of engineers, product managers, systems admins and designers, Twine was announced at the Web 2.0 Summit in October of 2007 and went into full public beta in Q1 of 2008. Twine was well-received by the press and early-adopter users.

Soon after our initial beta launch we raised a Series B round, led by Vulcan Capital and Velocity Interactive Group (now named Fuse Capital), as well as DFJ. This gave us the capital to begin to grow Twine.com rapidly to become the major online destination we envisioned.

In the course of this work we made a number of additional technical breakthroughs, resulting in more than 20 patent filings in total, including several fundamental patents related to semantic data management, semantic portals, semantic social networking, semantic recommendations, semantic advertising, and semantic search.

Four of those patents have been granted so far and the rest are still pending — and perhaps the most interesting of these patents are related to our most recent work on “T2” and are not yet visible.

At the time of beta launch and for almost six months after, Twine was still very much a work in progress. Fortunately our users and the press were fairly forgiving as we worked through evolving the GUI and feature set from what was initially just slightly better than an alpha site to the highly refined and graphical UI we have today.

During these early days of Twine.com we were fortunate to have a devoted user-base and this became a thriving community of power-users who really helped us to refine the product and develop great content within it.

Rapid Growth, and Scaling Challenges

As Twine grew the community went through many changes and some growing pains, and eventually crossed the chasm to a more mainstream user-base. Within less than a year from launch the site grew to around 3 million monthly visitors, 300,000 registered users, 25,000 “twines” about various interests, and almost 5 million pieces of user-contributed content. It was on its way to becoming the largest semantic web on the Web.

By all accounts Twine was looking like a potential “hit.” During this period the company staff increased to more than 40 people (inclusive of contractors and offshore teams) and our monthly burn rate increased to aggressive levels of spending to keep up with growth.

Despite this growth and spending we still could not keep up with demand for new features and at times we experienced major scaling and performance challenges. We had always planned for several more iterations of our backend architecture to facilitate scaling the system. But now we could see the writing on the wall — we had to begin to develop a more powerful, more scalable backend for Twine, much sooner than we had expected we would need to.

This required us to increase our engineering spending further in order to simultaneously support the live version of Twine and its very substantial backend, and run a parallel development team working on the next generation of the backend and the next version of Twine on top of it. Running multiple development teams instead of one was a challenging and costly endeavor. The engineering team was stretched thin and we were all putting in 12 to 15 hour days every day.

Breakthrough to “T2”

We began to work in earnest on a new iteration of our back-end architecture and application framework — one that could scale fast enough to keep up with our unexpectedly fast growth rate and the increasing demands on our servers that this was causing.

This initiative yielded unexpected fruit. Not only did we solve our scaling problems, but we were able to do so to such a degree that entirely new possibilities were opened up to us — ones that had previously been out of reach for purely technical reasons. In particular, semantic search.

Semantic search had always been a long-term goal of ours, however, in the first version of Twine (the one that is currently online) search was our weakest feature area, due to the challenge of scaling a semantic datastore to handle hundreds of billions of triples. But our user-studies revealed that it was in fact the feature our users wanted us to develop the most – search slowly became the dominant paradigm within Twine, especially when the content in our system reached critical mass.

Our new architecture initiative solved the semantic search problem to such a degree that we realized that not only could we scale Twine.com, we could scale it to eventually become a semantic search engine for the entire Web.

Instead of relying on users to crowdsource only a subset of the best content into our index, we could crawl large portions of the Web automatically and ingest millions and millions of Web pages, process them, and make them semantically searchable — using a true W3C Semantic Web compliant backend. (Note: Why did we even attempt to do this? We believed strongly in supporting open-standards for the Semantic Web, despite the fact that they posed major technical challenges and required tools that did not exist yet, because they promised to enable semantic application and data interoperability, one of the main potential benefits of the Semantic Web).

Based on our newfound ability to do Web-scale semantic search, we began planning the next version of Twine — Twine 2.0 (“T2”), with the help of Bob Morgan, Mark Erickson, Sasi Reddy, and a team of great designers.

The new T2 plan would merge new faceted semantic search features with the existing social, personalization and knowledge management features of Twine 1.0. It would be the best of both worlds: semantic search + social search. We began working intensively on developing T2, along with a new hosted developer tools that would make it easy for any webmaster to easily add their site into our semantic index. We were certain that with T2 we had finally “cracked the code” to the Semantic Web — we had a product plan and a strategy that could really bring the Semantic Web to everyone on the Web. It elegantly solved the key challenges to adoption and on a technical level, using SOLR instead of a giant triplestore, we were able to scale to unprecedented levels. It was an exciting plan and everyone on the team was confident in the direction.

To see screenshots that demo T2 and our hosted development tools click here.

The Global Recession

Our growth was fast, and so was our spending, but at the time this seemed logical because the future looked bright and we were in a race to keep ahead of our own curve. We were quickly nearing a point where we would soon need to raise another round of funding to sustain our pace, but we were confident that with our growth trends steadily increasing and our exciting plans for T2, the necessary funding would be forthcoming at favorable valuations.

We were wrong.

The global economy crashed unexpectedly, throwing a major curveball in our path. We had not planned on that happening and it certainly was inconvenient to say the least.

The recession not only hit Wall Street, it hit Silicon Valley. Venture capital funding dried up almost overnight. VC funds sent alarming letters to their portfolio companies warning of dire financial turmoil ahead. Many startups were forced to close their doors, while others made drastic sudden layoffs for better or for worse. We too made spending cuts, but we were limited in our ability to slash expenses until the new T2 platform could be completed. Once that was done, we would be able to move Twine to a much more scalable and less costly architecture, and we would no longer need parallel development teams. But until that happened, we still had to maintain a sizeable infrastructure and engineering effort.

As the recession dragged on, and the clock kept ticking down, the urgency of raising a C round increased, and finally we were faced with a painful decision. We had to drastically reduce our spending in order to wait out the recession and live to raise more funding in the future.

Unfortunately, the only way to accomplish such a drastic reduction in spending was to lay off almost 30% of our staff and cut our monthly spending by almost 40%. But by doing that we could not possibly continue to work on as many fronts as we had been doing. The result was that we had to stop most work on Twine 1.0 (the version that was currently online) and focus all our remaining development cycles and spending on the team needed to continue our work on T2.

This was extremely painful for me as the CEO, and for everyone on our team. But it was necessary for the survival of the business and it did buy us valuable time. However, it also slowed us down tremendously. The irony of making this decision was that it reduced our burn-rate but slowed us down, reduced productivity, and cost us time to such a degree that in the end it may have cost us the same amount of money anyway.

While much of our traffic had been organic and direct, we also had a number of marketing partnerships and PR initiatives that we had to terminate. In addition, as part of this layoff we lost our amazing and talented marketing team, as well as half our product management team, our entire design team, our entire marketing and PR budget, and much of our support and community management team. This made it difficult to continue to promote the site, launch new features, fix bugs, or to support our existing online community. And as a result the service began to decline and usage declined along with it.

To make matters worse, at around the same time as we were making these drastic cuts, Google decided to de-index Twine. To this day we still are not sure why they decided to do this – it could have been that Google suddenly decided we were a competitive search engine, or it could be that their algorithm changed, or it could be that there was some error in our HTML markup that may have caused an indexing problem. We had literally millions of pages of topical user-generated content – but all of a sudden we saw drastic reductions in the number of pages being indexed, and in the ranking of those pages. This caused a very significant drop in organic traffic. With what little team I had remaining we spent time petitioning Google and trying to get reinstated. But we never managed to return to our former levels of index prominence.

Eventually, with all these obstacles, and the fact that we had to focus our remaining budget on T2, we put Twine.com on auto-pilot and let the traffic fall off, believing that we would have the opportunity to win it back once we launched next versipn. While painful to watch, this reduction in traffic and user activity at least had the benefit of reducing the pressure on the engineering team to scale the system and support it under load, giving us time to focus all our energy on getting T2 finished and on raising more funds.

But the recession dragged on and on and on, without end. VC’s remained extremely conservative and risk-averse. Meanwhile, we focused our internal work on growing a large semantic index of the Web in T2, vertical by vertical, starting with food, then games, and then many other topics (technology, health, sports, etc.). We were quite confident that if we could bring T2 to market it would be a turning point for Web search, and funding would follow.

Meanwhile we met with VC’s in earnest. But nobody was able to invest in anything due to the recession. Furthermore we were a pre-revenue company working on a risky advanced technology and VC partnerships were far too terrified by the recession to make such a bet. We encountered the dreaded “wait and see” response.

The only way we could get the funding we needed to continue was to launch T2, grow it, and generate revenues from it, but the only way we could reach those milestones was to launch T2 in the first place: a classic catch-22 situation.

We took comfort in the fact that we were not alone in this predicament. Almost every tech company at our stage was facing similar funding challenges. However, we were determined to find a solution despite the obstacles in our path.

Selling the Business

Had the recession not happened, I believe we would have raised a strong C round based on the momentum of the product and our technical achievements. Unfortunately, we, like many other early-stage technology ventures, found ourselves in the worst capital crunch in decades.

We eventually came to the conclusion that there was no viable path for the company but to use the runway we had left to sell to another entity that was more able to fund the ongoing development and marketing necessary to monetize T2.

While selling the company had always been a desirable exit strategy, we had hoped to do it after the launch and growth of T2. However, we could not afford to wait any longer. With some short-term bridge funding from our existing investors, we worked with Growth Point Technology Partners to sell the company.

We met with a number of the leading Internet and media companies and received numerous offers. In the end, the best and most strategically compatible offer came from Evri, one of our sibling companies in Vulcan Capital’s portfolio. While we had the option to sell to larger and more established companies with very compelling offers, it was simply the best option to join Evri.

And so we find ourselves at the present day. We got the best deal possible for our shareholders given the circumstances. Twine.com, my team, our users and their data are safe and sound. As an entrepreneur and CEO it is, as one advisor put it, of the utmost importance to always keep the company moving forward. I feel that I did manage to achieve this under extremely difficult economic circumstances. And for that I am grateful.

Outlook for the Semantic Web

I’ve been one of the most outspoken advocates of the Semantic Web during my tenure at Twine. So what about my outlook for the Semantic Web now that Twine is being sold and I’m starting to do other things? Do I still believe in the promise of the Semantic Web vision? Where is it going? These are questions I expect to be asked, so I will attempt to answer them here.

I continue to believe in the promise of semantic technologies, and in particular the approach of the W3C semantic web standards (RDF, OWL, SPARQL). That said, having tried to bring them to market as hard as anyone ever has, I can truly say they present significant challenges both to developers and to end-users. These challenges all stem from one underlying problem: Data storage.

Existing SQL databases are not optimal for large-scale, high-performance semantic data storage and retrieval. Yet triplestores are still not ready for prime-time. New graph databases and column stores show a lot of promise, but they are still only beginning to emerge. This situation makes it incredibly difficult to bring Web-scale semantic applications to market cost-effectively.

Enterprise semantic applications are much more feasible today however — because existing and emerging databases and semantic storage solutions do scale to enterprise levels. But for consumer-grade, enormous, Web services, there are still challenges. This is single greatest technical obstacle that Twine faced and it cost us a large amount of our venture funding to surmount. Finally we did find a solution with our T2 architecture, but it is still not a general solution for all types of applications.

I have recently seen some new graph data storage products that may provide the levels of scale and performance needed, but pricing has not been determined yet. In short, storage and retrieval of semantic graph datasets is a big unsolved challenge that is holding back the entire industry. We need federated database systems that can handle hundreds of billions to trillions of triples under high load conditions, in the cloud, on commodity hardware and open source software. Only then will it be affordable to make semantic applications and services at Web-scale.

I believe that semantic metadata is essential for the growth and evolution of the Web. It is one of the only ways we can hope to dig out from the increasing problem of information overload. It is one of the only ways to make search, discovery, and collaboration smart enough to really be significantly better than it is today.

But the notion that everyone will learn and adopt standards for creating this metadata themselves is flawed in my opinion. They won’t. Instead, we must focus on solutions (like Twine and Evri) that make this metadata automatically by analyzing content semantically. I believe this is the most practical approach to bringing the value of semantic search and discovery to consumers, as well as Webmasters and content providers around the Web.

The major search engines are all working on various forms of semantic search, but to my knowledge none of them are fully supporting the W3C standards for the Semantic Web. In some cases this is because they are attempting to co-opt the standards for their own competitive advantage, and in other cases it is because it is simply easier not to use them. But in taking the easier path, they are giving up the long-term potential gains of a truly open and interoperable semantic ecosystem.

I do believe that whoever enables this open semantic ecosystem first will win in the end — because it will have greater and faster network effects than any closed competing system. That is the promise and beauty of open standards: everyone can feel safe using them since no single commercial interest controls them. At least that’s the vision I see for the Semantic Web.

As far as where the Semantic Web will add the most value in years to come, I think we will see it appear in some new areas. First and foremost is e-commerce, an area that is ripe with structured data that needs to be normalized, integrated and made more searchable. This is perhaps the most potentially profitable and immediately useful application of semantic technologies. It’s also one where there has been very little innovation. But imagine if eBay or Amazon or Salesforce.com provided open-standards-compliant semantic metadata and semantic search across all their data.

Another important opportunity is search and SEO — these are the areas that Twine’s T2 project focused on, by enabling webmasters to easily and semi-automatically add semantic descriptions of their content into search indexes, without forcing them to learn RDF and OWL and do it manually. This would create a better SEO ecosystem and would be beneficial not only to content providers and search engines, but also to advertisers. This is the approach that I believe the major search engines should take.

Another area where semantics could add a lot of value is social media — by providing semantic descriptions of user profiles and user profile data, as well as social relationships on the Web, it would be possible to integrate and search across all social networks in a unified manner.

Finally, another area where semantics will be beneficial is to enable easier integration of datasets and applications around the Web — currently every database is a separate island, but by using the Semantic Web appropriately data can be freed from databases and easily reused, remixed and repurposed by other applications. I look forward to the promise of a truly open data layer on the Web, when the Web becomes essentially one big open database that all applications can use.

Lessons Learned and Advice for Startups

While the outcome for Twine was decent under the circumstances, and was certainly far better than the alternative of simply running out of money, I do wonder how it could have been different. I ask myself what I learned and what I would do differently if I had the chance or could go back in time.

I think the most important lessons I learned, and the advice that I would give to other entrepreneurs can be summarized with a few key points:

  1. Raise as little venture capital as possible. Raise less than you need, not more than you need. Don’t raise extra capital just because it is available. Later on it will make it harder to raise further capital when you really need it. If you can avoid raising venture capital at all, do so. It comes with many strings attached. Angel funding is far preferable. But best of all, self-fund from revenues as early as you can, if possible. If you must raise venture capital, raise as little as you can get by on — even if they offer you more. But make sure you have at least enough to reach your next funding round — and assume that it will take twice as long to close as you think. It is no easy task to get a startup funded and launched in this economy — the odds are not in your favor — so play defense, not offense, until conditions improve (years from now).
  2. Build for lower exits. Design your business model and capital strategy so that you can deliver a good ROI to your investors at an exit under $30mm. Exit prices are going lower, not higher. There is less competition and fewer buyers and they know it’s a buyer’s market. So make sure your capital strategy gives the option to sell in lower price ranges. If you raise too much you create a situation where you either have to sell at a loss, or raise even more funding which only makes the exit goal that much harder to reach.
  3. Spend less. Spend less than you want to, less than you need to, and less than you can. When you are flush with capital it is tempting to spend it and grow aggressively, but don’t. Assume the market will crash — downturns are more frequent and last longer than they used to. Expect that. Plan on it. And make sure you keep enough capital in reserve to spend 9 to 12 months raising your next round, because that is how long it takes in this economy to get a round done.
  4. Don’t rely on user-traction to raise funding. You cannot assume that user traction is enough to get your next round done. Even millions of users and exponential growth are not enough. VC’s and their investment committees want to see revenues, and particularly at least breakeven revenues. A large service that isn’t bringing in revenues yet is not a business, it’s an experiment. Perhaps it’s one that someone will buy, but if you can’t find a buyer then what? Don’t assume that VC’s will fund it. They won’t. Venture capital investing has changed dramatically — early stage and late stage deals are the only deals that are getting real funding. Mid-stage companies are simply left to die, unless they are profitable or will soon be profitable.
  5. Don’t be afraid to downsize when you have to. It sucks to fire people, but it’s sometimes simply necessary. One of the worst mistakes is to not fire people who should be fired, or to not do layoffs when the business needs require it. You lose credibility as a leader if you don’t act decisively. Often friendships and personal loyalties prevent or delay leaders from firing people that really should be fired. While friendship and loyalty are noble they unfortunately are not always the best thing for the business. It’s better for everyone to take their medicine sooner rather than later. Your team knows who should be fired. Your team knows when layoffs are needed. Ask them. Then do it. If you don’t feel comfortable firing people, or you can’t do it, or you don’t do it when you need to, don’t be the CEO.
  6. Develop cheaply, but still pay market salaries. Use offshore development resources, or locate your engineering team outside of the main “tech hub” cities. It is simply too expensive to compete with large public and private tech companies to pay top dollar for engineering talent in places like San Francisco and Silicon Valley.  The cost of top-level engineers is too high in major cities to be affordable and the competition to hire and retain them is intense. If you can get engineers to work for free or for half price then perhaps you can do it, but I believe you get what you pay for. So rather thank skimp on salaries, pay people market salaries, but do it where market salaries are more affordable.
  7. Only innovate on one frontier at a time. For example, either innovate by making a new platform, or a new application, or a new business model. Don’t do all of these at once, it’s just too hard. If you want to make a new platform, just focus on that, don’t try to make an application too. If you want to make a new application, use an existing platform rather than also building a platform for it. If you want to make a new business model, use an existing application and platform — they can be ones you have built in the past, but don’t attempt to do it all at once. If you must do all three, do them sequentially, and make sure you can hit cash flow breakeven at each stage, with each one. Otherwise you’re at risk in this economy.

I hope that this advice is of some use to entrepreneurs (and VC’s) who are reading this. I’ve personally made all these mistakes myself, so I am speaking from experience. Hopefully I can spare you the trouble of having to learn these lessons the hard way.

What we did Well

I’ve spent considerable time in this article focusing on what didn’t go according to plan, and the mistakes we’ve learned from. But it’s also important to point out what we did right. I’m proud of the fact that Twine accomplished many milestones, including:

  • Pioneering the Semantic Web and leading the charge to make it a mainstream topic of conversation.
  • Creating the most powerful, developer friendly, platform for the Semantic Web.
  • Successfully completing our work on CALO, the largest Semantic Web project in the US.
  • Launching the first mainstream consumer application of Semantic Web.
  • Having a very successful launch, covered by hundreds of articles.
  • Gaining users extremely rapidly — faster than Twitter did in it’s early years.
  • Hiring and retaining an incredible team of industry veterans.
  • Raising nearly $24mm of venture capital over 2 rounds, because our plan was so promising.
  • Developing more than 20 patents, several of which are fundamentally important for the Semantic Web field.
  • Surviving two major economic bubbles and the downturns that followed.
  • Innovating and most of all, adapting to change rapidly.
  • Breaking through to T2 — a truly awesome technological innovation for Web-scale semantic search.
  • Selling the company in one of the most difficult economic environments in history.

I am proud of what we accomplished with Twine. It’s been “a long strange trip” but one that has been full of excitement and accomplishments to remember.

Conclusions

If you’ve actually read this far, thank you. This is a big article, but after all, Twine is a big project – One that lasted nearly 5 years (or 9 years if you include our original research phase). I’m still bullish on the Semantic Web, and genuinely very enthusiastic about what Evri will do with Twine.com going forward.

Again I want to thank the hundreds of people who have helped make Twine possible over the years – but in particular the members of our technical and management team who went far beyond the call of duty to get us to the deal we have reached with Evri.

While this is certainly the end of an era, I believe that this story has only just begun. The first chapters are complete and now we are moving into a new era. Much work remains to be done and there are certainly still challenges and unknowns, but progress continues and the Semantic Web is here to stay.

The Global Brain is About to Wake Up

The emerging realtime Web is not only going to speed up the Web and our lives, it is going to bring about a kind of awakening of our collective Global Brain. It’s going to change how many things happen on online, but it’s also going to change how we see and understand what the Web is doing. By speeding up the Web, it will cause processes that used to take weeks or months to unfold online, to happen in days or even minutes. And this will bring these processes to the human-scale — to the scale of our human “now” — making it possible for us to be aware of larger collective processes than before. We have until now been watching the Web in slow motion. As it speeds up, we will begin to see and understand what’s taking place on the Web in a whole new way.

This process of of quickening is part of a larger trend which I and others call “Nowism.” You can read more of my thoughts about Nowism here. Nowism is an orientation that is gaining momentum and will help to shape this decade, and in particular, how the Web unfolds. It is the idea that the present-timeframe (“the now”) is getting more important, shorter and also more information-rich. As this happens our civilization is becoming more focused on the now, and less focused on past or the future. Simply keeping up with the present is becoming an all-consuming challenge: Both a threat and an opportunity.

The realtime Web —  what I call “The Stream”  (see “Welcome to the Stream”) — is changing the unit of now. It’s making it shorter. The now is the span of time which we have to be aware of to be effective our work and lives, and it is getting shorter. On a personal level the now is getting shorter and denser — more information and change is packed into shorter spans of time; a single minute on Twitter is overflowing with potentially relevant messages and links. In business as well, the now is getting shorter and denser — it used to be about the size of a fiscal quarter, then it became a month, then a week, then a day, and now it is probably about half a day in span. Soon it will be just a few hours.

To keep up with what is going on we have to check in with the world in at least half-day chunks. Important news breaks about once or twice a day. Trends on Twitter take about a day to develop too. So basically, you can afford to just check  the news and the real-time Web once or twice a day and still get by. But that’s going to change.  As the now gets shorter, we’ll have to check in more frequently to keep abreast of change. As the Stream picks up speed in the middle of this decade, to remain competitive will require near-constant monitoring — we will have to always be connected to, and watching, the real-time Web and our personal streams. Being offline at all will risk missing out on big important trends, threats and opportunities that emerge and develop within minutes or hours. But nobody is capable of tracking the Stream all 24/7 — we must at least take breaks to eat and sleep. And this is a problem.

Big Changes to the Web Coming Soon…

With Nowism comes a faster Web, and this will lead to big changes in how we do various activities on the Web:

  • We will spend less time searching. Nowism pushes us to find better alternatives to search, or to eliminate search entirely, because people don’t have time to search anymore. We need tools that do the searching for us and that help with decision support so we don’t have to spend so much of our scarce time doing that. See my article on “Eliminating the Need for Search — Help Engines” for more about that.
  • Monitoring (not searching) the real-time stream becomes more important. We need to stay constantly vigilant about what’s happening, what’s trending. We need to be alerted of the important stuff (to us), and we need a way to filter out what’s not important to us. Probably a filter based on influence of people and tweets, and/or time dynamics of memes will be necessary. Monitoring the real-time stream effectively is different from searching it. I see more value in real-time monitoring than realtime search — I haven’t seen any monitoring tools for Twitter that are smart enough to give me just the content I want yet. There’s a real business opportunity there.
  • The return of agents. Intelligent agents are going to come back. To monitor the realtime Web effectively each of us will need online intelligent agents that can help us — because we don’t have time, and even if we did, there’s just too much information to sift through.
  • Influence becomes more important than relevance. Advertisers and marketers will look for the most influential parties (individuals or groups) on Twitter and other social media to connect with and work through. But to do this there has to be an effective way to measure influence. One service that’s providing a solution for this (which I’ve angel invested in and advise) is Klout.com – they measure influence per person per topic. I think that’s a good start.
  • Filtering content by influence. We also will need a way to find the most influential content. Influential content could be the content most RT’d or most RT’d by most influential people. It would be much less noisy to be able to see only the more influential tweets of people I follow. If a tweet gets RT’d a lot, or is RT’d by really influential people, then I want to see it. If not, then only if it’s really important (based on some rule). This will be the only way to cope with the information overload of the real-time Web and keep up with it effectively. I don’t know of anyone providing a service for this yet. It’s a business opportunity.
  • Nowness as a measure of value of content. We will need a new form of ranking of results by “nowness” – how timely they are now. So for example, in real-time search engines we shouldn’t rank results merely by how recent they are, but also by how timely, influential, and “hot” they are now. See my article from years ago on “A Physics of Ideas” for more about that. Real-time search companies should think of themselves as real-time monitoring companies — that’s what they are really going to be used for in the end. Only the real-time search ventures that think of themselves this way are going to survive the conceptual paradigm shift that the realtime Web is bringing about. In a realtime context, search is actually too late — once something has happened in the past it really is not that important anymore –what matters is current awareness: discovering the trends NOW. To do that one has to analyze the present, and the very recent past, much more than searching the longer term past. The focus has to be on real-time or near-real-time analytics, statistical analysis, topic and trend detection, prediction, filtering and alerting. Not search.
  • New ways to understand and navigate the now. We will need a way to visualize and navigate the now. I’m helping to incubate a stealth startup venture, Live Matrix, that is working on that. It hasn’t launched yet. It’s cool stuff. More on that in the future when they launch.
  • New tools for browsing the Stream. New tools will emerge for making the realtime Web more compelling and smarter. I’m working on incubating some new stealth startups in this area as well. They’re very early-stage so can’t say more about them yet.
  • The merger of semantics with the realtime Web. We need to make the realtime Web semantic — as well as the rest of the Web — in order to make it easier for software to make sense of it for us. This is the best approach to increasing the signal-to-noise ratio of content we have to look at whether searching or monitoring stuff. The Semantic Web standars of the W3C are key to this. I’ve written a long manifesto on this in “Minding The Planet: The Meaning and Future of the Semantic Web” if you’re really interested in that topic.

Faster Leads to Smarter

As the realtime web unfolds and speeds up, I think it will also have a big impact on what some people call “The Global Brain.” The Global Brain has always existed, but in recent times it has been experiencing a series of major upgrades — particularly around how connected, affordable, accessible and fast it is. First we got phone and faxes, then the Internet, the PC and the Web, and now the real-time Web and the Semantic Web. All of these recent changes are making the Global Brain faster, more richly interconnected. And this makes it smarter. For more about my thoughts on the Global Brain, see these two talks:

What’s most interesting to me is that as the rate of communication and messaging on the Web approaches near-real time, we may see a kind of phase change take place – a much smarter Global Brain will sort of begin to appear out of the chaos. In other words, the speed of collective thinking is as important to the complexity or sophistication of collective thinking, in making the Global Brain significantly more intelligent. In other words, I’m proposing that there is a sort of critical speed of collective thinking, before which the Global Brain seems like just a crowd of actors chaotically flocking around memes, and after which the Global Brain makes big leaps — instead of seeming like a chaotic crowd, it starts to look more like an organized group around certain activitities — it is able to respond to change faster, and optimize and even do things collectively more productively than a random crowd could.

This is kind of like film, or animation. When you watch a movie or animation you are really watching a rapid series of frames. This gives the illusion of there being cohesive, continuous characters, things and worlds in the movie — but really they aren’t there at all, it’s just an illusion — our brains put these scenes together and start to recognize and follow higher order patterns. A certain shape appears to maintain itself and move around relative to other shapes, and we name it with a certain label — but there isn’t really something there, let alone something moving or interacting — there are just frames flicking by rapidly . It turns out that after a critical frame rate (around 20 to 60 frames per second) the human brain stops seeing individual frames and starts seeing a continuous movie. When you start flipping pages fast enough it appears to be a coherent animation and then we start seeing things “moving within the sequence” of frames. In the same way, as the unit of time of (aka the speed) of the real-time Web increases, its behavior will start to seem more continuous and smarter — we won’t see separate chunks of time or messages, we’ll see intelligent continuous collective thinking and adaptation processes.

In other words, as the Web gets faster, we’ll start to see processes emerge within it that appear to be cohesive intelligent collective entities in their own right. There won’t really be any actual entities there that we can isolate, but when we watch the patterns on the Web it will appear as if such entities are there. This is basically what is happening at every level of scale — even in the real world. There really isn’t anything there that we can find — everything is divisible down to the quantum level and probably beyond — but over time our brains seem to recognize and label patterns as discrete “things.” This is what will happen across the Web as well. For example, a certain meme (such as a fad or a movement) may become a “thing” in it’s own right, a kind of entity that seemingly takes on a life of its own and seems to be doing something. Similarly certain groups or social networks or activities they engage in may seem to be intelligent entities in their own rights.

This is an illusion in that there really are no entities there, they are just collections of parts that themselves can be broken down into more parts, and no final entities can be found. However, nonethless, they will seem like intelligent entities when not analyzed in detail. In addition, the behavior of these chaotic systems may resist reduction — they may not even be understandable and their behavior may not be predictable through a purely reductionist approach — it may be that they react to their own internal state and their environments virtually in real-time, making it difficult to take a top-down or bottom-up view of what they are doing. In a realtime world, change happens in every direction.

As the Web gets faster, the patterns that are taking place across it will start to become more animated. Big processes that used to take months or years to happen will happen in minutes or hours. As this comes about we will begin to see larger patterns than before, and they will start to make more sense to us — they will emerge out of the mists of time so to speak, and become visible to us on our human timescale — the timescale of our human-level “now. As a result, we will become more aware of higher order dynamics taking place on the real-time Web, and we will begin to participate in and adapt to those dynamics, making those dynamics in turn even smarter. (For more on my thoughts about how the Global Brain gets smarter, see:  “How to Build the Global Mind.”)

See Part II: “Will The Web Become Conscious?” if you want to dig further into the thorny philosophical and scientific issues that this brings up…

Eliminating the Need for Search – Help Engines

We are so focused on how to improve present-day search engines. But that is a kind of mental myopia. In fact, a more interesting and fruitful question is why do people search at all? What are they trying to accomplish? And is there a better way to help them accomplish that than search?

Instead of finding more ways to get people to search, or ways to make existing search experiences better, I am starting to think about how to reduce or  eliminate the need to search — by replacing it with something better.

People don’t search because they like to. They search because there is something else they are trying to accomplish. So search is in fact really just an inconvenience — a means-to-an-end that we have to struggle through to do in order to get to what we actually really want to accomplish. Search is “in the way” between intention and action. It’s an intermediary stepping stone. And perhaps there’s a better way to get to where we want to go than searching.

Searching is a boring and menial activity. Think about it. We have to cleverly invent and try pseudo-natural-language queries that don’t really express what we mean. We try many different queries until we get results that approximate what we’re looking for. We click on a bunch of results and check them out. Then we search some more. And then some more clicking. Then more searching. And we never know whether we’ve been comprehensive, or have even entered the best query, or looked at all the things we should have looked at to be thorough. It’s extremely hit or miss. And takes up a lot of time and energy. There must be a better way! And there is.

Instead of making search more bloated and more of a focus, the goal should really be get search out of the way.  To minimize the need to search, and to make any search that is necessary as productive as possible. The goal should be to get consumers to what they really want with the least amount of searching and the least amount of effort, with the greatest amount of confidence that the results are accurate and comprehensive. To satisfy these constraints one must NOT simply build a slightly better search engine!

Instead, I think there’s something else we need to be building entirely. I don’t know what to call it yet. It’s not a search engine. So what is it?

Bing’s term “decision engine” is pretty good, pretty close to it. But what they’ve actually released so far still looks and feels a lot like a search engine. But at least it’s pushing the envelope beyond what Google has done with search. And this is good for competition and for consumers. Bing is heading in the right direction by leveraging natural language, semantics, and structured data. But there’s still a long way to go to really move the needle significantly beyond Google to be able to win dominant market share.

For the last decade the search wars have been fought in battles around index size, keyword search relevancy, and ad targeting — But I think the new battle is going to be fought around semantic understanding, intelligent answers, personal assistance, and commerce affiliate fees. What’s coming next after search engines are things that function more like assistants and brokers.

Wolfram Alpha is an example of one approach to this trend. The folks at Wolfram Alpha call their system a “computational knowledge engine” because they use a knowledge base to compute and synthesize answers to various questions. It does a lot of the heavy lifting for you, going through various data, computing and comparing, and then synthesizes a concise answer.

There are also other approaches to getting or generating answers for people — for example, by doing what Aardvark does: referring people to experts who can answer their questions or help them. Expert referral, or expertise search, helps reduce the need for networking and makes networking more efficient. It also reduces the need for searching online — instead of searching for an answer, just ask an expert.

There’s also the semantic search approach — perhaps exemplified by my own Twine “T2” project — which basically aims to improve the precision of search by helping you get to the right results faster, with less irrelevant noise. Other consumer facing semantic search projects of interest are Goby and Powerset (now part of Bing).

Still another approach is that of Siri, which is making an intelligent “task completion assistant” that helps you search for and accomplish things like “book a romantic dinner and a movie tonight.” In some ways Siri is a “do engine” not a “search engine.” Siri uses artificial intelligence to help you do things more productively. This is quite needed and will potentially be quite useful, especially on mobile devices.

All of these approaches and projects are promising. But I think the next frontier — the thing that is beyond search and removes the need for search is still a bit different — it is going to combine elements of all of the above approaches, with something new.

For a lack of a better term, I call this a “help engine.” A help engine proactively helps you with various kinds of needs, decisions, tasks, or goals you want to accomplish. And it does this by helping with an increasingly common and vexing problem: choice overload.

The biggest problem is that we have too many choices, and the number of choices keeps increasing exponentially. The Web and globalization have increased the number of choices that are within range for all of us, but the result has been overload. To make a good, well-researched, confident choice now requires a lot of investigation, comparisons, and thinking. It’s just becoming too much work.

For example, choosing a location for an event, or planning a trip itinerary, or choosing what medicine to take, deciding what product to buy, who to hire, what company to work for, what stock to invest in, what website to read about some topic. These kinds of activities require a lot of research, evaluations of choices, comparisons, testing, and thinking. A lot of clicking. And they also happen to be some of the most monetizable activities for search engines. Existing search engines like Google that make money from getting you to click on their pages as much as possible have no financial incentive to solve this problem — if they actually worked so well that consumers clicked less they would make less money.

I think the solution to what’s after search — the “next Google” so to speak — will come from outside the traditional search engine companies. Or at least it will be an upstart project within one of them that surprises everyone and doesn’t come from the main search teams within them. It’s really such a new direction from traditional search and will require some real thinking outside of the box.

I’ve been thinking about this a lot over the last month or two. It’s fascinating. What if there was a better way to help consumers with the activities they are trying to accomplish than search? If it existed it could actually replace search. It’s a Google-sized opportunity, and one which I don’t think Google is going to solve.

Search engines cause choice overload. That wasn’t the goal, but it is what has happened over time due to the growth of the Web and the explosion of choices that are visible, available, and accessible to us via the Web.

What we need now is not a search engine — it’s something that solves the problem created by search engines. For this reason, the next Google probably won’t be Google or a search engine at all.

I’m not advocating for artificial intelligence or anything that tries to replicate human reasoning, human understanding, or human knowledge. I’m actually thinking about something simpler. I think that it’s possible to use computers to provide consumers with extremely good, automated decision-support over the Web and the kinds of activities they engage in. Search engines are almost the most primitive form of decision support imaginable. I think we can do a lot better. And we have to.

People use search engines as a form of decision-support, because they don’t have a better alternative. And there are many places where decision support and help are needed: Shopping, travel, health, careers, personal finance, home improvement, and even across entertainment and lifestyle categories.

What if there was a way to provide this kind of personal decision-support — this kind of help — with an entirely different user experience than search engines provide today? I think there is. And I’ve got some specific thoughts about this, but it’s too early to explain them; they’re still forming.

I keep finding myself thinking about this topic, and arriving at big insights in the process. All of the different things I’ve worked on in the past seem to connect to this idea in interesting ways. Perhaps it’s going to be one of the main themes I’ll be working on and thinking about for this coming decade.

Twine "T2" – Latest Demo Screenshots (Internal Alpha)

This is a series of screenshots that demo the latest build of the consumer experience and developer tools for Twine.com’s “T2” semantic search product. This is still in internal alpha — not released to public yet.

The Road to Semantic Search — The Twine.com Story

This is the story of Twine.com — our early research (with never before seen screenshots of our early semantic desktop work), and our evolution from Twine 1.0 towards Twine 2.0 (“T2”) which is focused on semantic search.

The Web Wide World — The Web Spreads Into the Physical World

I have noticed an interesting and important trend of late. The Web is starting to spread outside of what we think of as “the Web” and into “the World.” This trend is exemplified by many data points. For example:

  • The Web on mobile devices like the iPhone. Finally it’s really usable on a phone. Now it goes everywhere with us. Soon we will track our own paths on our phones as we move around, creating a virtual map of our favorite places and routes.
  • Location aware applications and services, such as Google Maps Mobile. They link physical places to virtual places on the Web.
  • The Web in cars.  Auto avigation units will soon be Web-enabled.
  • Next-generation Wi-Fi digital cameras are wifi-enabled, linking directly to camera GPS and to photo sharing and storage services. Will cloud-centric wireless cameras with zero local storage come next?
  • Web picture frames such as Ceiva bring the Web into your grandma’s livingroom.
  • The Web in restaurants and stores. Your server gets your reservation on the Web from OpenTable. In-store kiosks connect to the Web to help you shop, or to bring up your online account and shopping cart.
  • The Web in your garden. GardenGro‘s sensor connects your garden to the Web, in order to figure out what to plant and how to cultivate it in your actual location.
  • Everything becomes trackable with RFID. Physical objects have virtual locations.
  • Sensors are connecting to the Web and popping up everywhere. For example here.
  • Plastic Logic‘s portable plastic reading device. The pad of paper, version 2.0.
  • The beginnings of an Internet of Things — where every thing has an address on the Web.
  • The rise of Lifestreaming, in which everything (or much of what) one does is captured to the Web and even broadcast.
  • Progress on Augmented Reality — instead of the physical world going into virtual worlds, the virtual world is going to flow into the physical world.

These are just a few data points. There are many many more. The trendline is clear to me.

Things are not going to turn out the way we thought. Instead of everything going digital — a future in which we all live as avatars in cyberspace — The digital world is going to invade the physical world. We already are the avatars and the physical world is becoming cyberspace. The idea that cyberspace is some other place is going to dissolve because everything will be part of the Web. The digital world is going physical.

When this happens — and it will happen soon, perhaps within 20 years or less — the notion of “the Web” will become just a quaint, antique concept from the early days when the Web still lived in a box. Nobody will think about “going on the Web” or “going online” because they will never NOT be on the Web, they will always be online.

Think about that. A world in which every physical object, everything we do, and eventually perhaps our every thought and action is recorded, augmented, and possibly shared. What will the world be like when it’s all connected? When all our bodies and brains are connected together — when even our physical spaces, furniture, products, tools, and even our natural environments, are all online? Beyond just a Global Brain, we are really building a Global Body.

The World is becoming the Web. The “Web Wide World” is coming and is going to be a big theme of the next 20 years.

The Next Generation of Web Search — Search 3.0

The next generation of Web search is coming sooner than expected. And with it we will see several shifts in the way people search, and the way major search engines provide search functionality to consumers.

Web 1.0, the first decade of the Web (1989 – 1999), was characterized by a distinctly desktop-like search paradigm. The overriding idea was that the Web is a collection of documents, not unlike the folder tree on the desktop, that must be searched and ranked hierarchically. Relevancy was considered to be how closely a document matched a given query string.

Web 2.0, the second decade of the Web (1999 – 2009), ushered in the beginnings of a shift towards social search. In particular blogging tools, social bookmarking tools, social networks, social media sites, and microblogging services began to organize the Web around people and their relationships. This added the beginnings of a primitive “web of trust” to the search repertoire, enabling search engines to begin to take the social value of content (as evidences by discussions, ratings, sharing, linking, referrals, etc.) as an additional measurment in the relevancy equation. Those items which were both most relevant on a keyword level, and most relevant in the social graph (closer and/or more popular in the graph), were considered to be more relevant. Thus results could be ranked according to their social value — how many people in the community liked them and current activity level — as
well as by semantic relevancy measures.

In the coming third decade of the Web, Web 3.0 (2009 – 2019), there will be another shift in the search paradigm. This is a shift to from the past to the present, and from the social to the personal.

Established search engines like Google rank results primarily by keyword (semantic) relevancy. Social search engines rank results primarily by activity and social value (Digg, Twine 1.0, etc.). But the new search engines of the Web 3.0 era will also take into account two additional factors when determining relevancy: timeliness, and personalization.

Google returns the same results for everyone. But why should that be the case? In fact, when two different people search for the same information, they may want to get very different kinds of results. Someone who is a novice in a field may want beginner-level information to rank higher in the results than someone who is an expert. There may be a desire to emphasize things that are novel over things that have been seen before, or that have happened in the past — the more timely something is the more relevant it may be as well.

These two themes — present and personal — will define the next great search experience.

To accomplish this, we need to make progress on a number of fronts.

First of all, search engines need better ways to understand what content is, without having to do extensive computation. The best solution for this is to utilize metadata and the methods of the emerging semantic web.

Metadata reduces the need for computation in order to determine what content is about — it makes that explicit and machine-understandable. To the extent that machine-understandable metadata is added or generated for the Web, it will become more precisely searchable and productive for searchers.

This applies especially to the area of the real-time Web, where for example short “tweets” of content contain very little context to support good natural-language processing. There a little metadata can go a long way. In addition, of course metadata makes a dramatic difference in search of the larger non-real-time Web as well.

In addition to metadata, search engines need to modify their algorithms to be more personalized. Instead of a “one-size fits all” ranking for each query, the ranking may differ for different people depending on their varying interests and search histories.

Finally, to provide better search of the present, search has to become more realtime. To this end, rankings need to be developed that surface not only what just happened now, but what happened recently and is also trending upwards and/or of note. Realtime search has to be more than merely listing search results chronologically. There must be effective ways to filter the noise and surface what’s most important effectively. Social graph analysis is a key tool for doing this, but in
addition, powerful statistical analysis and new visualizations may also be required to make a compelling experience.

Welcome to the Stream – Next Phase of the Web

May 8, 2009

Welcome to The Stream

The Internet began evolving many decades before the Web emerged. And while today many people think of the Internet and the Web as one and the same, in fact they are different. The Web lives on top of the Internet’s infrastructure much like software and documents live on top of an operating system on a computer.

And just as the Web once emerged on top of the Internet, now something new is emerging on top of the Web: I call this the Stream. The Stream is the next phase of the Internet’s evolution. It’s what comes after, or on top of, the Web we’ve all been building and using.

Perhaps the best and most current example of the Stream is the rise of Twitter, Facebook and other microblogging tools. These services are visibly streamlike, their user-interfaces are literally streams; streams of ideas, thinking and conversation. In reaction to microblogs we are also starting to see the birth of new tools to manage and interact with these streams, and to help understand, search, and follow the trends that are rippling across them. Just as the Web is not any one particular site or service, the Stream is not any one site or service — it’s the collective movement that is taking place across them all.

To meet the challenges and opportunities of the Stream a new ecosystem of services is rapidly emerging: stream publishers, stream syndication tools, stream aggregators, stream readers, stream filters, real-time stream search engines, and stream analytics engines, stream advertising networks, and stream portals are emerging rapidly. All of these new services are the beginning of the era of the Stream.

Web History

The original Tim Berners-Lee proposal that started the Web was in March, 1989. The first two decades of the Web (Web 1.0 from 1989 – 1999, and Web 2.0 from 1999 – 2009) were focused on the development of the Web itself. Web 3.0 (2009 – 2019), the third-decade of the Web, officially began in March of this year and will be focused around the Stream.

  • In the 1990’s with the advent of HTTP and HTML, the metaphor of “the Web” was born and concepts of webs and sites captured our imaginations.
  • In the early 2000’s the focus shifted to graphs such as social networks and the beginnings of the Semantic Web.
  • Now, in the coming third decade, the focus is shifting to the Stream and with it, stream oriented metaphors of flows, currents, and ripples.

The Web has always been a stream. In fact it has been a stream of streams. Each site can be viewed as a stream of pages developing over time. Each page can be viewed as a stream of words, that changes whenever it is edited. Branches of sites can also be viewed as streams of pages developing in various directions.

But with the advent of blogs, feeds, and microblogs, the streamlike nature of the Web is becoming more readily visible, because these newer services are more 1-dimensional and conversational than earlier forms of websites, and they update far more frequently.

Defining the Stream

Just as the Web is formed of sites, pages and links, the Stream is formed of streams.

Streams are rapidly changing sequences of information around a topic. They may be microblogs, hashtags, feeds, multimedia services, or even data streams via APIs.

The key is that streams change often. This change is an important part of the value they provide (unlike static Websites, which do not necessarily need to change in order to provide value). In addition, it is important to note that streams have URI’s — they are addressable entities.

So what defines a stream versus an ordinary website?

  1. Change. Change is the key reason why a stream is valuable. That is not always so with a website.  Websites do not have to change at all to be valuable — they could for example just be static but comprehensive reference library collections. But streams on the other hand change very frequently, and it is this constant change that is their main point.
  2. Interface Independence.
    Streams are streams of data, and they can be fully accessed and consumed independently of any particular user-interface — via syndication of their data into various tools. Websites on the other hand, are only accessible via their user-interfaces. In the era of the Web the provider controlled the interface. In the new era of the stream, the consumer controls the interface.
  3. Conversation is king.
    An interesting and important point is that streams are linked together not by hotlinks, but by acts of conversation — for example, replies, “retweets,” comments and ratings, and “follows.” In the era of the Web the hotlink was king. But in the era of the Stream conversation is king.

In terms of structure, streams are comprised of agents, messages and interactions:

  • Agents are people as well as software apps that publish to streams.
  • Messages are publications by agents to streams — for example, short posts to their microblogs.
  • Interactions are communication acts, such as sending a direct message or a reply, or quoting someone (“retweeting”), that connect and transmit messages between agents.

The Global Mind

If the Internet is our collective nervous system, and the Web is our collective brain, then the Stream is our collective mind. The nervous system and the brain are like the underlying hardware and software, but the mind is what the system is actually thinking in real-time. These three layers are interconnected, yet are distinctly different aspects, of our emerging and increasingly awakened planetary intelligence.

The Stream is what the Web is thinking and doing, right now. It’s our collective stream of consciousness.

The Stream is the dynamic activity of the Web, unfolding over time. It is the conversations, the live streams of audio and video, the changes to Web sites that are happening, the ideas and trends — the memes — that are rippling across millions of Web pages, applications, and human minds.

The Now is Getting Shorter

The Web is changing faster than ever, and as this happens, it’s becoming more fluid. Sites no longer change in weeks or days, but hours, minutes or even seconds. if we are offline even for a few minutes we may risk falling behind, or even missing something absolutely critical. The transition from a slow Web to a fast-moving Stream is happening quickly. And as this happens we are shifting our attention from the past to the present, and our “now” is getting shorter.

The era of the Web was mostly about the past — pages that were published months, weeks, days or at least hours before we looked for them. Search engines indexed the past for us to make it accessible: On the Web we are all used to searching Google and then looking at pages from the recent past and even farther back in the past. But in the era of the Stream, everything is shifting to the present — we can see new
posts as they appear and conversations emerge around them, live, while we watch.

Yet as the pace of the Stream quickens, what we think of as “now” gets shorter. Instead of now being a day, it is an hour, or a few minutes. The unit of change is getting more granular.

For example, if you monitor the public timeline, or even just your friends timeline in Twitter or Facebook you see that things quickly flow out of view, into the past. Our attention is mainly focused on right now: the last few minutes or hours. Anything that was posted before this period of time is “out of sight, out of mind.”

The Stream is a world of even shorter attention spans, online viral sensations, instant fame, sudden trends, and intense volatility. It is also a world of extremly short-term conversations and thinking.

This is the world we may be entering. It is both the great challenge, and the great opportunity of the coming decade of the Web.

How Will We Cope With the Stream?

The Web has always been a stream — it has been happening in real-time since it started, but it was slower — pages changed less frequently, new things were published less often, trends developed less quickly. Today it is getting so much faster, and as this happens its feeding back on itself and we’re feeding into it, amplifying it even more.

Things have also changed qualitatively in recent months. The streamlike aspects of the Web have really moved into the foreground of our mainstream cultural conversation. Everyone is suddenly talking about Facebook and Twitter. Celebrities. Talk show hosts. Parents. Teens.

And suddenly we’re all finding ourselves glued to various activity streams, microblogging manically and squinting to catch fleeting references to things we care about as they rapidly flow by and out of view. The Stream has arrived.

But how can we all keep up with this ever growing onslaught of information effectively? Will we each be knocked over by our own personal firehose, or will tools emerge to help us filter our streams down to managable levels? And if we’re already finding that we have too many streams today, and must jump between them ever more often, how will we ever be able to function with 10X more streams in a few years?

Human attention is a tremendous bottleneck in the world of the Stream. We can only attend to one thing, or at most a few things, at once. As information comes at us from various sources, we have to jump from one item to the next. We cannot absorb it all at once. This fundamental barrier may be overcome with technology in the future, but for the next decade at least it will still be a key obstacle.

We can follow many streams, but only one-item-at-a-time; and this requires rapidly shifting our focus from one article to another and from one stream to another. And there’s no great alternative: Cramming all our separate streams into one merged activity stream quickly gets too noisy and overwhelming to use.

The ability to view different streams for different contexts is very important and enables us to filter and focus our attention effectively. As a result, it’s unlikely there will be a single activity stream — we’ll have many, many streams. And we’ll have to find ways to cope with this reality.

Streams may be unidirectional or bidirectional. Some streams are more like “feeds” that go from content providers to content consumers. Other streams are more like conversations or channels in which anyone can be both a provider and a consumer of content.

As streams become a primary mode of content distribution and communication, they will increasingly be more conversational and less like feeds. And this is important — because to participate in a feed you can be passive, you don’t have to be present synchronously.  But to participate in a conversation you have to be present and synchronous — you have to be there, while it happens, or you may miss out on it entirely.

A Stream of Challenges and Opportunities

We are going to need new kinds of tools for managing and participating in streams, and we are already seeing the emergence of some of them. For example Twitter clients like Tweetdeck, RSS feed readers, and activity stream tracking tools like Facebook and Friendfeed. There are also new tools for filtering our streams around interests, for example Twine.com (* Disclosure: the author of this article is a principal in Twine.com). Real-time search tools are also emerging to provide quick ways to scan the Stream as a whole. And trend discovery tools are helping us to see
what’s hot in real-time.

One of the most difficult challenges will be how to know what to pay attention to in the Stream: Information and conversation flow by so quickly that we can barely keep up with the present, let alone the past. How will know what to focus on, what we just have to read, and what to ignore or perhaps read later?

Recently many sites have emerged that attempt to show what is trending up in real-time, for example by measuring how many retweets various URLs are getting in Twitter. But these services only show the huge and most popular trends. What about all the important stuff that’s not trending up massively? Will people even notice things that are not widely RT’d or “liked”? Does popularity equal importance of content?

Certainly one measure of the value of an item in the Stream is social popularity. Another measure is how relevant it is to a topic, or even more importantly, to our own personal and unique interests. To really cope with the Stream we will need ways to filter that combine both these different approaches. Furthermore as our context shifts throughout the day (for example from work to various projects or clients to shopping to health to entertainment, to family etc) we need tools that can adapt to filter the Stream differently based on what we now care about.

A Stream oriented Internet also offers new opportunities for monetization. For example, new ad distribution networks could form to enable advertisers to buy impressions in near-real time across URLs that are trending up in the Stream, or within various slices of it. For example, an advertiser could distribute their ad across dozens of pages that are getting heavily retweeted right now. As those pages begin to decline in RT’s per minute, the ads might begin to move over to different URLs that are starting to gain.

Ad networks that do a good job of measuring real-time attention trends may be able to capitalize on these trends faster and provide better results to advertisers. For example, an advertiser that is able to detect and immediately jump on the hot new meme of the day, could get their ad in front of the leading influencers they want to reach, almost instantly. And this could translate to sudden gains in awareness and branding.

The emergence of the Stream is an interesting paradigm shift that may turn out to characterize the next evolution of the Web, this coming third-decade of the Web’s development. Even though the underlying data model may be increasingly like a graph, or even a semantic graph, the user experience will be increasingly stream oriented.

Whether Twitter, or some other app, the Web is becoming increasingly streamlike. How will we filter this stream? How will we cope? Whoever can solve these problems first and best is probably going to get rich.

Other Articles on This Topic

http://www.techmeme.com/090517/p6#a090517p6

http://www.techcrunch.com/2009/05/17/jump-into-the-stream/

http://www.techcrunch.com/2009/02/15/mining-the-thought-stream/

Wolfram Alpha is Coming — And It Could be as Important as Google

Notes:

– This article last updated on March 11, 2009.

– For follow-up, connect with me about this on Twitter here.

– See also: for more details, be sure to read the new review by Doug Lenat, creator of Cyc. He just saw the Wolfram Alpha demo and has added many useful insights.

——————————————————————–

Introducing Wolfram Alpha

Stephen Wolfram is building something new — and it is really impressive and significant. In fact it may be as important for the Web (and the world) as Google, but for a different purpose. It’s not a “Google killer” — it does something different. It’s an “answer engine” rather than a search engine.

Stephen was kind enough to spend two hours with me last week to demo his new online service — Wolfram Alpha (scheduled to open in May). In the course of our conversation we took a close look at Wolfram Alpha’s capabilities, discussed where it might go, and what it means for the Web, and even the Semantic Web.

Stephen has not released many details of his project publicly yet, so I will respect that and not give a visual description of exactly what I saw. However, he has revealed it a bit in a recent article, and so below I will give my reactions to what I saw and what I think it means. And from that you should be able to get at least some idea of the power of this new system.

A Computational Knowledge Engine for the Web

In a nutshell, Wolfram and his team have built what he calls a “computational knowledge engine” for the Web. OK, so what does that really mean? Basically it means that you can ask it factual questions and it computes answers for you.

It doesn’t simply return documents that (might) contain the answers, like Google does, and it isn’t just a giant database of knowledge, like the Wikipedia. It doesn’t simply parse natural language and then use that to retrieve documents, like Powerset, for example.

Instead, Wolfram Alpha actually computes the answers to a wide range of questions — like questions that have factual answers such as “What is the location of Timbuktu?” or “How many protons are in a hydrogen atom?,” “What was the average rainfall in Boston last year?,” “What is the 307th digit of Pi?,” or “what would 80/20 vision look like?”

Think about that for a minute. It computes the answers. Wolfram Alpha doesn’t simply contain huge amounts of manually entered pairs of questions and answers, nor does it search for answers in a database of facts. Instead, it understands and then computes answers to certain kinds of questions.

(Update: in fact, Wolfram Alpha doesn’t merely answer questions, it also helps users to explore knowledge, data and relationships between things. It can even open up new questions — the “answers” it provides include computed data or facts, plus relevant diagrams, graphs, and links to other related questions and sources. It also can be used to ask questions that are new explorations between relationships, data sets or systems of knowledge. It does not just provides textual answers to questions — it helps you explore ideas and create new knowledge as well)

How Does it Work?

Wolfram Alpha is a system for computing the answers to questions. To accomplish this it uses built-in models of fields of knowledge, complete with data and algorithms, that represent real-world knowledge.

For example, it contains formal models of much of what we know about science — massive amounts of data about various physical laws and properties, as well as data about the physical world.

Based on this you can ask it scientific questions and it can compute the answers for you. Even if it has not been programmed explicity to answer each question you might ask it.

But science is just one of the domains it knows about — it also knows about technology, geography, weather, cooking, business, travel, people, music, and more.

Alpha does not answer natural language queries — you have to ask questions in a particular syntax, or various forms of abbreviated notation. This requires a little bit of learning, but it’s quite intuitive and in some cases even resembles natural language or the keywordese we’re used to in Google.

The vision seems to be to create a system wich can do for formal knowledge (all the formally definable systems, heuristics, algorithms, rules, methods, theorems, and facts in the world) what search engines have done for informal knowledge (all the text and documents in various forms of media).

How Does it Differ from Google?

Wolfram Alpha and Google are very different animals. Google is designed to help people find Web pages. It’s a big lookup system basically, a librarian for the Web. Wolfram Alpha on the other hand is not at all oriented towards finding Web pages, it’s for computing factual answers. It’s much more like a giant calculator for computing all sorts of answers to questions that involve or require numbers. Alpha is for calculating, not for finding. So it doesn’t compete with Google’s core business at all. In fact, it is much more comptetive with the Wikipedia than with Google.

On the other hand, while Alpha doesn’t compete with Google, Google may compete with Alpha. Google is increasingly trying to answer factual questions directly — for example unit conversions, questions about the time, the weather, the stock market, geography, etc. But in this area, Alpha has a powerful advantage: it’s built on top of Wolfram’s Mathematica engine, which represents decades of work and is perhaps the most powerful calculation engine ever built.

How Smart is it and Will it Take Over the World?

Wolfram Alpha is like plugging into a vast electronic brain. It provides extremely impressive and thorough answers to a wide range of questions asked in many different ways, and it computes answers, it doesn’t merely look them up in a big database.

In this respect it is vastly smarter than (and different from) Google. Google simply retrieves documents based on keyword searches. Google doesn’t understand the question or the answer, and doesn’t compute answers based on models of various fields of human knowledge.

But as intelligent as it seems, Wolfram Alpha is not HAL 9000, and it wasn’t intended to be. It doesn’t have a sense of self or opinions or feelings. It’s not artificial intelligence in the sense of being a simulation of a human mind. Instead, it is a system that has been engineered to provide really rich knowledge about human knowledge — it’s a very powerful calculator that doesn’t just work for math problems — it works for many other kinds of questions that have unambiguous (computable) answers.

There is no risk of Wolfram Alpha becoming too smart, or taking over the world. It’s good at answering factual questions; it’s a computing machine, a tool — not a mind.

One of the most surprising aspects of this project is that Wolfram has been able to keep it secret for so long. I say this because it is a monumental effort (and achievement) and almost absurdly ambitious. The project involves more than a hundred people working in stealth to create a vast system of reusable, computable knowledge, from terabytes of raw data, statistics, algorithms, data feeds, and expertise. But he appears to have done it, and kept it quiet for a long time while it was being developed.

Computation Versus Lookup

For those who are more scientifically inclined, Stephen showed me many interesting examples — for example, Wolfram Alpha was able to solve novel numeric sequencing problems, calculus problems, and could answer questions about the human genome too. It was also able to compute answers to questions about many other kinds of topics (cooking, people, economics, etc.). Some commenters on this article have mentioned that in some cases Google appears to be able to answer questions, or at least the answers appear at the top of Google’s results. So what is the Big Deal? The Big Deal is that Wolfram Alpha doesn’t merely look up the answers like Google does, it computes them using at least some level of domain understanding and reasoning, plus vast amounts of data about the topic being asked about.

Computation is in many cases a better alternative to lookup. For example, you could solve math problems using lookup — that is what a multiplication table is after all. For a small multiplication table, lookup might even be almost as computationally inexpensive as computing the answers. But imagine trying to create a lookup table of all answers to all possible multiplication problems — an infinite multiplication table. That is a clear case where lookup is no longer a better option compared to computation.

The ability to compute the answer on a case by case basis, only when asked, is clearly more efficient than trying to enumerate and store an infinitely large multiplication table. The computation approach only requires a finite amount of data storage — just enough to store the algorithms for solving general multiplication problems — whereas the lookup table approach requires an infinite amount of storage — it requires actually storing, in advance, the products of all pairs of numbers.

(Note: If we really want to store the products of ALL pairs of numbers, it turns out this is impossible to accomplish, because there are an infinite number of numbers. It would require an infinite amount of time to simply generate the data, and an infinite amount of storage to store it. In fact, just to enumerate and store all themultiplication products of the numbers between 0 and 1 would require an infinite amount of time and storage. This is because the real-numbers are uncountable. There are in fact more real-numbers than integers (see the work of Georg Cantor on this). However, the same problem holds even if we are speaking of integers — it would require an infinite amount of storage to store all their multiplication products, although they at least could be enumerated, given infinite time.)

Using the above analogy, we can see why a computational system like Wolfram Alpha is ultimately a more efficient way to compute the answers to many kinds offactual questions than a lookup system like Google. Even though Google is becoming increasingly comprehensive as more information comes on-line and gets indexed, it will never know EVERYTHING. Google is effectively just a lookup table of everything that has been written and published on the Web, that Google has found. But not everything has been published yet, and furthermore Google’s index is also incomplete, and always will be.

Therefore Google does and always will contain gaps. It cannot possibly index the answer to every question that matters or will matter in the future — it doesn’t contain all the questions or all the answers. If nobody has ever published a particular question-answer pair onto some Web page, then Google will not be able to index it, and won’t be able to help you find the answer to that question — UNLESS Google also is able to compute the answer like Wolfram Alpha does (an area that Google is probably working on, but most likely not to as sophisticated a level as Wolfram’s Mathematica engine enables).

While Google only provide answers that are found on some Web page (or at least in some data set they index), a computational knowledge engine like Wolfram Alpha can provide answers to questions it has never seen before — provided however that it at least knows the necessary algorithms for answering such questions, and it at least has sufficient data to compute the answers using these algorithms. This is a “big if” of course.

Wolfram Alpha substitutes computation for storage. It is simply more compact to store general algorithms for computing the answers to various types of potential factual questions, than to store all possible answers to all possible factual questions. In then end making this tradeoff in favor of computation wins, at least for subject domains where the space of possible factual questions and answers islarge. A computational engine is simply more compact and extensible than a database of all questions and answers.

This tradeoff, as Mills Davis points out in the comments to this article is also referred to as the tradeoff between time and space in computation. For very difficult computations, it may take a long time to compute the answer. If the answer was simply stored in a database already of course that would be faster and more efficient. Therefore, a hybrid approach would be for a system like Wolfram Alpha to store all the answers to any questions that have already been asked of it, so that they can be provided by simple lookup in the future, rather than recalculated each time. There may also already be databases of precomputed answers to very hard problems, such as finding very large prime numbers for example. These should also be stored in the system for simple lookup, rather than having to be recomputed. I think that Wolfram Alpha is probably taking this approach. For many questions it doesn’t make sense to store all the answers in advance, but certainly for some questions it is more efficient to store the answers, when you already know them, and just look them up.

Other Competition

Where Google is a system for FINDING things that we as a civilization collectively publish, Wolfram Alpha is for COMPUTING answers to questions about what we as a civilization collectively know. It’s the next step in the distribution of knowledge and intelligence around the world — a new leap in the intelligence of our collective”Global Brain.” And like any big next-step, Wolfram Alpha works in a new way — it computes answers instead of just looking them up.

Wolfram Alpha, at its heart is quite different from a brute force statistical search engine like Google. And it is not going to replace Google — it is not a general search engine: You would probably not use Wolfram Alpha to shop for a new car, find blog posts about a topic, or to choose a resort for your honeymoon. It is not a system that will understand the nuances of what you consider to be the perfect romanticgetaway, for example — there is still no substitute for manual human-guided search for that. Where it appears to excel is when you want facts about something, or when you need to compute a factual answer to some set of questions about factual data.

I think the folks at Google will be surprised by Wolfram Alpha, and they will probably want to own it, but not because it risks cutting into their core search engine traffic. Instead, it will be because it opens up an entirely new field of potential traffic around questions, answers and computations that you can’t do on Google today.

The services that are probably going to be most threatened by a service like Wolfram Alpha are the Wikipedia, Cyc, Metaweb’s Freebase, True Knowledge, the START Project, and natural language search engines (such as Microsoft’s upcoming search engine, based perhaps in part on Powerset‘s technology), and other services that are trying to build comprehensive factual knowledge bases.

As a side-note, my own service, Twine.com, is NOT trying to do what Wolfram Alpha is trying to do, fortunately. Instead, Twine uses the Semantic Web to help people filter the Web, organize knowledge, and track their interests. It’s a very different goal. And I’m glad, because I would not want to be competing withWolfram Alpha. It’s a force to be reckoned with.

Relationship to the Semantic Web

During our discussion, after I tried and failed to poke holes in his natural language parser for a while, we turned to the question of just what this thing is, and how it relates to other approaches like the Semantic Web.

The first question was could (or even should) Wolfram Alpha be built using the Semantic Web in some manner, rather than (or as well as) the Mathematica engine it is currently built on. Is anything missed by not building it with Semantic Web’s languages (RDF, OWL, Sparql, etc.)?

The answer is that there is no reason that one MUST use the Semantic Web stack to build something like Wolfram Alpha. In fact, in my opinion it would be far too difficult to try to explicitly represent everything Wolfram Alpha knows and can compute using OWL ontologies and the reasoning that they enable. It is just too wide a range of human knowledge and giant OWL ontologies are too difficult to build and curate.

It would of course at some point be beneficial to integrate with the Semantic Web so that the knowledge in Wolfram Alpha could be accessed, linked with, and reasoned with, by other semantic applications on the Web, and perhaps to make it easier to pull knowledge in from outside as well. Wolfram Alpha could probably play better with other Web services in the future by providing RDF and OWL representations of it’s knowledge, via a SPARQL query interface — the basic open standards of the Semantic Web. However for the internal knowledge representation and reasoning that takes places in Wolfram Alpah, OWL and RDF are not required and it appears Wolfram has found a more pragmatic and efficient representation of his own.

I don’t think he needs the Semantic Web INSIDE his engine, at least; it seems to be doing just fine without it. This view is in fact not different from the current mainstream approach to the Semantic Web — as one commenter on this article pointed out, “what you do in your database is your business” — the power of the Semantic Web is really for knowledge linking and exchange — for linking data and reasoning across different databases. As Wolfram Alpha connects with the rest ofthe “linked data Web,” Wolfram Alpha could benefit from providing access to its knowledge via OWL, RDF and Sparql. But that’s off in the future.

It is important to note that just like OpenCyc (which has taken decades to build up a very broad knowledge base of common sense knowledge and reasoning heuristics), Wolfram Alpha is also a centrally hand-curated system. Somehow, perhaps just secretly but over a long period of time, or perhaps due to some new formulation or methodology for rapid knowledge-entry, Wolfram and his team have figured out a way to make the process of building up a broad knowledge base about the world practical where all others who have tried this have found it takes far longer than expected. The task is gargantuan — there is just so much diverse knowledge in the world. Representing even a small area of it formally turns out to be extremely difficult and time-consuming.

It has generally not been considered feasible for any one group to hand-curate all knowledge about every subject. The centralized hand-curation of Wolfram Alpha is certainly more controllable, manageable and efficient for a project of this scale and complexity. It avoids problems of data quality and data-consistency. But it’s also apotential bottleneck and most certainly a cost-center. Yet it appears to be a tradeoff that Wolfram can afford to make, and one worth making as well, from what I could see. I don’t yet know how Wolfram has managed to assemble his knowledge base in less than a very long time, or even how much knowledge he and his team have really added, but at first glance it seems to be a large amount. I look forward to learning more about this aspect of the project.

Building Blocks for Knowledge Computing

Wolfram Alpha is almost more of an engineering accomplishment than a scientific one — Wolfram has broken down the set of factual questions we might ask, and the computational models and data necessary for answering them, into basic building blocks — a kind of basic language for knowledge computing if you will. Then, with these building blocks in hand his system is able to compute with them — to break down questions into the basic building blocks and computations necessary to answer them, and then to actually build up computations and compute the answers on the fly.

Wolfram’s team manually entered, and in some cases automatically pulled in, masses of raw factual data about various fields of knowledge, plus models and algorithms for doing computations with the data. By building all of this in a modular fashion on top of the Mathematica engine, they have built a system that is able to actually do computations over vast data sets representing real-world knowledge. More importantly, it enables anyone to easily construct their own computations — simply by asking questions.

The scientific and philosophical underpinnings of Wolfram Alpha are similar to those of the cellular automata systems he describes in his book, “A New Kind of Science” (NKS). Just as with cellular automata (such as the famous “Game of Life” algorithm that many have seen on screensavers), a set of simple rules and data can be used to generate surprisingly diverse, even lifelike patterns. One of the observations of NKS is that incredibly rich, even unpredictable patterns, can be generated from tiny sets of simple rules and data, when they are applied to their own output over and over again.

In fact, cellular automata, by using just a few simple repetitive rules, can compute anything any computer or computer program can compute, in theory at least. But actually using such systems to build real computers or useful programs (such as Web browsers) has never been practical because they are so low-level it would not be efficient (it would be like trying to build a giant computer, starting from theatomic level).

The simplicity and elegance of cellular automata proves that anything that may be computed — and potentially anything that may exist in nature — can be generated from very simple building blocks and rules that interact locally with one another. There is no top-down control, there is no overarching model. Instead, from a bunch of low-level parts that interact only with other nearby parts, complex global behaviors emerge that, for example, can simulate physical systems such as fluid flow, optics, population dynamics in nature, voting behaviors, and perhaps even the very nature of space-time. This is the main point of the NKS book in fact, and Wolfram draws numerous examples from nature and cellular automata to make his case.

But with all its focus on recombining simple bits of information according to simple rules, cellular automata is not a reductionist approach to science — in fact, it is much more focused on synthesizing complex emergent behaviors from simple elements than in reducing complexity back to simple units. The highly synthetic philosophy behind NKS is the paradigm shift at the basis of Wolfram Alpha’s approach too. It is a system that is very much “bottom-up” in orientation. This isnot to say that Wolfram Alpha IS a cellular automaton itself — but rather that it is similarly based on fundamental rules and data that are recombined to form highly sophisticated structures.

Wolfram has created a set of building blocks for working with formal knowledge to generate useful computations, and in turn, by putting these computations together you can answer even more sophisticated questions and so on. It’s a system for synthesizing sophisticated computations from simple computations. Of course anyone who understands computer programming will recognize this as the very essence of good software design. But the key is that instead of forcing users to writeprograms to do this in Mathematica, Wolfram Alpha enables them to simply ask questions in natural language and then automatically assembles the programs to compute the answers they need.

Wolfram Alpha perhaps represents what may be a new approach to creating an “intelligent machine” that does away with much of the manual labor of explicitly building top-down expert systems about fields of knowledge (the traditional AI approach, such as that taken by the Cyc project), while simultaneously avoiding the complexities of trying to do anything reasonable with the messy distributed knowledge on the Web (the open-standards Semantic Web approach). It’s simplerthan top down AI and easier than the original vision of Semantic Web.

Generally if someone had proposed doing this to me, I would have said it was not practical. But Wolfram seems to have figured out a way to do it. The proof is that he’s done it. It works. I’ve seen it myself.

Questions Abound

Of course, questions abound. It remains to be seen just how smartWolfram Alpha really is, or can be. How easily extensible is it? Willit get increasingly hard to add and maintain knowledge as more is addedto it? Will it ever make mistakes? What forms of knowledge will it beable to handle in the future?

I think Wolfram would agree that it is probably never going to be able to give relationship or career advice, for example, because that is “fuzzy” — there is often no single right answer to such questions. And I don’t know how comprehensive it is, or how it will be able to keep up with all the new knowledge in the world (the knowledge in the system is exclusively added by Wolfram’s team right now, which is a labor intensive process). But Wolfram is an ambitious guy. He seems confident that he has figured out how to add new knowledge to the system at a fairly rapid pace, and he seems to be planning to make the system extremely broad.

And there is the question of bias, which we addressed as well. Is there any risk of bias in the answers the system gives because all the knowledge is entered by Wolfram’s team? Those who enter the knowledge and design the formal models in the system are in a position to both define the way the system thinks — both the questions and the answers it can handle. Wolfram believes that by focusing on factual knowledge — things like you might find in the Wikipedia or textbooks or reports — the bias problem can be avoided. At least he is focusing the systemon questions that do have only one answer — not questions for which there might be many different opinions. Everyone generally agrees for example that the closing price of GOOG on a certain data is a particular dollar amount. It is not debatable. These are the kinds of questions the system addresses.

But even for some supposedly factual questions, there are potential biases in the answers one might come up with, depending on the data sources and paradigms used to compute them. Thus the choice of data sources has to be made carefully to try to reflect as non-biased a view as possible. Wolfram’s strategy is to rely on widely accepted data sources like well-known scientific models, public data about factual things like the weather, geography and the stock market published byreputable organizatoins and government agencies, etc. But of course even this is a particular worldview and reflects certain implicit or explicit assumptions about what data sources are authoritative.

This is a system that reflects one perspective — that of Wolfram and his team — which probably is a close approximation of the mainstream consensus scientific worldview of our modern civilization. It is a tool — a tool for answering questions about the world today, based on what we generally agree that we know about it. Still, this is potentially murky philosophical territory, at least for some kinds ofquestions. Consider global warming — not all scientists even agree it is taking place, let alone what it signifies or where the trends are headed. Similarly in economics, based on certain assumptions and measurements we are either experiencing only mild inflation right now, or significant inflation. There is not necessarily one right answer — there are valid alternative perspectives.

I agree with Wolfram, that bias in the data choices will not be a problem, at least for a while. But even scientists don’t always agree on the answers to factual questions, or what models to use to describe the world — and this disagreement is essential to progress in science in fact. If there is only one “right” answer to any question there could never be progress, or even different points of view. Fortunately, Wolfram is desigining his system to link to alternative questions andanswers at least, and even to sources for more information about the answers (such as the Wikipeda for example). In this way he can provide unambiguous factual answers, yet also connect to more information and points of view about them at the same time. This is important.

It is ironic that a system like Wolfram Alpha, which is designed to answer questions factually, will probably bring up a broad range of questions that don’t themselves have unambiguous factual answers — questions about philosophy, perspective, and even public policy in the future (if it becomes very widely used). It is a system that has the potential to touch our lives as deeply as Google. Yet how widely it will be used is an open question too.

The system is beautiful, and the user interface is already quite simple and clean. In addition, answers include computationally generated diagrams and graphs — not just text. It looks really cool. But it is also designed by and for people with IQ’s somewhere in the altitude of Wolfram’s — some work will need to be done dumbing it down a few hundred IQ points so as to not overwhelm the average consumer with answers that are so comprehensive that they require a graduate degree to fully understand.

It also remains to be seen how much the average consumer thirsts for answers to factual questions. I do think all consumers at times have a need for this kind of intelligence once in a while, but perhaps not as often as they need something like Google. But I am sure that academics, researchers, students, government employees, journalists and a broad range of professionals in all fields definitely need a tool like this and will use it every day.

Future Potential

I think there is more potential to this system than Stephen has revealed so far. I think he has bigger ambitions for it in the long-term future. I believe it has the potential to be THE online service for computing factual answers. THE system for factual knowlege on the Web. More than that, it may eventually have the potential to learn and even to make new discoveries. We’ll have to wait and see where Wolfram takes it.

Maybe Wolfram Alpha could even do a better job of retrieving documents than Google, for certain kinds of questions — by first understanding what you really want, then computing the answer, and then giving you links to documents that related to the answer. But even if it is never applied to document retrieval, I think it has the potential to play a leading role in all our daily lives — it could function likea kind of expert assistant, with all the facts and computational power in the world at our fingertips.

I would expect that Wolfram Alpha will open up various API’s in the future and then we’ll begin to see some interesting new, intelligent, applications begin to emerge based on its underlying capabilities and what it knows already.

In May, Wolfram plans to open up what I believe will be a first version of Wolfram Alpha. Anyone interested in a smarter Web will find it quite interesting, I think. Meanwhile, I look forward to learning more about this project as Stephen reveals more in months to come.

One thing is certain, Wolfram Alpha is quite impressive and Stephen Wolfram deserves all the congratulations he is soon going to get.

Appendix: Answer Engines vs. Search Engines

The above article about Wolfram Alpha has created quite a stir on the blogosphere (Note: For those who haven’t used Techmeme before: just move your mouse over the “discussion” links under the Techmeme headline and expand to see references to related responses)

But while the response from most was quite positive and hopeful, some writers jumped to conclusions, went snarky, or entirely missed the point.

For example some articles such as this one by Jon Stokes at Ars Technica, quickly veered into refuting points that I in fact never made (Stokes seems to have not actually read my article in full before blogging his reply perhaps, or maybe he did read it but simply missed my point).

Other articles such as this one by Saul Hansell of the New York Times’ Bits blog,focused on the business questions — again a topic that I did not address in my article. My article was about the technology, not the company or the business opportunity.

The most common misconception in the articles that misesd the point concerns whether Wolfram Alpha is a “Google killer.”

In fact I was very careful in the title of my article, and the content, to make the distinction between Wolfram Alpha and Google. And I tried to make it clear that Wolfram Alpha is not designed to be a “Google killer.” It has a very different purpose: it doesn’t compete with Google for general document retreival, instead it answers factual questions.

Wolfram Alpha is an “answer engine” not a search engine.

Answer engines are different category of tool from search engines. They understand and answer questions — they don’t simply retrieve documents. (Note: in fact, Wolfram Alpha doesn’t merely answer questions, it also helps users to explore knowledge and data visually and can even open up new questions)

Of course Wolfram Alpha is not alone in making a system that can answer questions. This has been a longstanding dream of computer scientists, artificial intelligence theorists, and even a few brave entrepreneurs in the past.

Google has also been working on answering questions that are typed directly into their search box. For example, type a geography question or even “what time is it in Italy” into the Google search box and you will get a direct answer. But the reasoning and computational capabilities of Google’s “answer engine” features are primitivecompared to what Wolfram Alpha does.

For example, the Google search box does not compute answers to calculus problems, or tell you what phase the moon will be in on a certain future date, or tell you the distance from San Francisco to Ulan Bator, Mongolia.

Many questions can or might be answered by Google, using simple database lookup, provided that Google already has the answers in its index or databases. But there are many questions that Google does not yet find or store the answers to efficiently. And there always will be.

Google’s search box provides some answers to common computational questions (perhaps via looking them up in a big database in some cases, or perhaps by computing the answers in other cases). But so far it has limited range. Of course the folks at Google could work more on this. They have the resources if they want to. But they are far behind Wolfram Alpha, and others (for example, the START project, which I recently learned about today, True Knowledge and Cyc project, among many others).

The approach taken by Wolfram Alpha — and others working on “answer engines” is not to build the world’s largest database of answers but rather to build a system that can compute answers to unanticipated questions. Google has built a system that can retrieve any document on the Web. Wolfram Alpha is designed to be a system that can answer any factual question in the world.

Of course, if the Wolfram Alpha people are clever (and they are), they will probably design their system to also leverage databases of known answers whenever they can, and to also store any new answers they compute to save the trouble of re-computing them if asked again in the future. But they are fundamentally not making a database lookup oriented service. They are making a computation oriented service.

Answer engines do not compete with search engines, but some search engines (such as Google) may compete with answer engines. Time will tell if search engine leaders like Google will put enough resources into this area of functionality to dominate it, or whether they will simply team up with the likes of Wolfram and/or others who have put a lot more time into this problem already.

In any case, Wolfram Alpha is not a “Google killer.” It wasn’t designed to be one. It does however answer useful questions — and everyone has questions. There is an opportunity to get a lot of traffic, depending on things that still need some thought (such as branding, for starters). The opportunity is there, although we don’t yet know whether Wolfram Alpha will win it. I think it certainly has all the hallmarks of a strong contender at least.

Video: My Talk on the Evolution of the Global Brain at the Singularity Summit

If you are interested in collective intelligence, consciousness, the global brain and the evolution of artificial intelligence and superhuman intelligence, you may want to see my talk at the 2008 Singularity Summit. The videos from the Summit have just come online.

(Many thanks to Hrafn Thorisson who worked with me as my research assistant for this talk).

How to Build the Global Mind

Kevin Kelly recently wrote another fascinating article about evidence of a global superorganism. It’s another useful contribution to the ongoing evolution of this meme.

I tend to agree that we are at what Kevin calls, Stage III. However, an important distinction in my own thinking is that the superorganism is not comprised just of machines, but it is also comprised of people.

(Note: I propose that we abbreviate the One Machine, as “the OM.” It’s easier to write and it sounds cool.)

Today, humans still make up the majority of processors in the OM. Each human nervous system comprises billions of processors, and there are billions of humans. That’s a lot of processors.

However, Ray Kurzweil posits that the balance of processors is rapidly movingtowards favoring machines — and that sometime in the latter half of this century, machine processors will outnumber or at least outcompute all the human processors combined, perhaps many times over.

While agree with Ray’s point that machine intelligence will soon outnumber human intelligence, I’m skeptical of Kurzweil’s timeline, especially in light of recent research that shows evidence of quantum level computation within microtubules inside nuerons. If in fact the brain computes at the tubulin level then it may have many orders of magnitude more processors than currently estimated. This remains to be determined. Those who argue against this claim that the brain can be modelled on a Classical level and that quantum computing need not be invoked. To be clear, I am not claiming that the brain is a quantum computer, I am claiming that there seems to be evidence that computation in the brain takes place at the quantum level, or near it. Whether quantum effects have any measurable effect on what the brain does is not the question, the question is simply whether microtubules are the lowest level processing elements of the brain. If they are,then there are a whole lot more processors in the brain than previously thought.

Another point worth considering is that much of the brain’s computation is not taking place within the neurons but rather in the gaps between synapses, and this computation happens chemically rather than electrically. There are vastly more synapses than neurons, and computation within the synapses happens at a much faster and more granular level than neuronal firings. It is definitely the case thatchemical-level computations take place with elements that are many orders of magnitude smaller than neurons. This is another case for the brain computing at a much lower level than is currently thought.

In other words the resolution of computation in the human brain is still unknown. We have several competing approximations but no final answer on this. I do think however that evidence points to computation being much more granular than we currently think.

In any case, I do agree with Kurzweil that at least it is definitely the case that artificial computers will outnumber naturally occurring human computers on this planet — it’s just a question of when. In my view it will take a little longer than he thinks: it is likely to happen after 100 to 200 years at the most.

There is another aspect of my thinking on this subject which I think may throw a wrench in the works. I don’t think that what we call “consciousness” is something that can be synthesized. Humans appear to be conscious, but we have no idea what that means yet. It is undeniable that we all have an experience of being conscious, and this experience is mysterious. It is also the case that at least so far, nobody hasbuilt a software program or hardware device that seems to be having this experience. We don’t even know how to test for consciousness in fact. For example, the much touted Turing Test does not test consciousness, it tests humanlike intelligence. There really isn’t a test for consciousness yet. Devising one is an interesting an important goal that we should perhaps be working on.

In my own view, consciousness is probably fundamental to the substrate of the universe, like space, time and energy. We don’t know what space, time and energy actually are. We cannot actually measure them directly either. All our measurements of space, time and energy are indirect — we measure other things that imply that space, time and energy exist. Space, time and energy are inferred by effects we observe on material things that we can measure. I think the same may be true of consciousness. So the question is, what are the measureable effects ofconsciousness? Well one candidate seems to be the Double Slit experiment, which shows that the act of observation causes the quantum wave function to collapse. Are there other effects we can cite as evidence of consciousness?

I have recently been wondering how connected consciousness is to the substrate of the universe we are in. If consciousness is a property of the substrate, then it may be impossible to synthesize. For example, we never synthesize space, time or energy — no matter what we do, we are simply using the space, time and energy of the substrate that is this universe.

If this is the case, then creating consciousness is impossible. The best we can do is somehow channel the consciousness that is already there in the substrate of the universe. In fact, that may be what the human nervous system does: it channels consciousness, much in the way that an electrical circuit channels electricity. The reason that software programs will probably not become conscious is that they aretoo many levels removed from the substrate. There is little or no feedback between the high-level representations of cognition in AI programs and the quantum-level computation (and possibly consciousness) of the physical substrate of the universe. That is not the case in the human nervous system — in the human nervous system the basic computing elements and all the cognitive activity are directly tied to thephysical substrate of the universe. There is at least the potential for two-way feedback to take place between the human mind (the software), the human brain (a sort of virtual machine), and the quantum field (the actual hardware).

So the question I have been asking myself lately is how connected is consciousness to the physical substrate? And furthermore, how important is consciousness to what we consider intelligence to be? If consciousness is important to intelligence, then artificial intelligence may not be achievable through software alone — it mayrequire consciousness, which may in turn require a different kind of computing system, one which is more connected (through bidirectional feedback) to the physical quantum substrate of the universe.

What all this means to me is that human beings may form an important and potentially irreplaceable part of the OM — the One Machine — the emerging global superorganism. In particular today the humans are still the most intelligent parts. But in the future when machine intelligence may exceed human intelligence a billionfold, humans may still be the only or at least most conscious parts of the system. Because of the uniquely human capacity for consciousness (actually, animals and insects are conscious too), I think we have an important role to playin the emerging superorganism. We are it’s awareness. We are who watches, feels, and knows what it is thinking and doing ultimately.

Because humans are the actual witnesses and knowers of what the OM does and thinks, the function of the OM will very likely be to serve and amplify humans, rather than to replace them. It will be a system that is comprised of humans and machines working together, for human benefit, not for machine benefit. This is a very different future outlook than that of people who predict a kind of “Terminator-esque” future in which machines get smart enough to exterminate the human race. It won’t happen that way. Machines will very likely not get that smart for a long time, if ever, because they are not going to be conscious. I think we should be much more afraid of humans exterminating humanity than of machines doing it.

So to get to Kevin Kelly’s Level IV, what he calls “An Intelligent Conscious Superorganism” we simply have to include humans in the system. Machines alone are not, and will not ever be, enough to get us there. I don’t believe consciousness can be sythesized or that it will suddenly appear in a suitably complex computer program. I think it is a property of the substrate, and computer programs are just too many levels removed from the substrate. Now, it is possible that we mightdevise a new kind of computer architecture — one which is much more connected to the quantum field. Perhaps in such a system, consciousness, like electricity, could be embodied. That’s a possibility. It is likely that such a system would be more biological in nature, but that’s just a guess. It’s an interesting direction forresearch.

In any case, if we are willing to include humans in the global superorganism — the OM, the One Machine — then we are already at Kevin Kelly’s Level IV. If we are not willing to include them, then I don’t think will reach Level IV anytime soon, or perhaps ever.

It is also important to note that consciousness has many levels, just like intelligence. There is basic raw consciousness which simply perceives the qualia of what takes place. But there are also forms of consciousness which are more powerful — for example, consciousness that is aware of itself, and consciousness which is so highly tuned that it has much higher resolution, and consciousness which is aware of the physical substrate and its qualities of being spacelike and empty of any kind of fundamental existence. These are in fact the qualities of the quantum substrate we live in. Interestingly, they are also the qualities of reality that Buddhists masters also point out to be the ultimate nature of reality and of the mind (they do not consider reality and mind to be two different things ultimately). Consciousness may or may not be aware of these qualities of consciousness and ofreality itself — consciousness can be dull, or low-grade, or simply not awake. The level to which consciousness is aware of the substrate is a way to measure the grade of consciousness taking place. We might call this dimension of consciousness, “resolution.” The higher the resolution of consciousness is, the more acutely aware it is of the actual nature of phenomena, the substrate. At the highest  resolutionit can directly percieve the space-like, mind-like, quantum nature of what it observes. At the highest level of resolution, there is no perception of duality between observer and observed — consciousness perceives everything to be essentially consciousness appearing in different forms and behaving in a quantum fashion.

Another dimension of consciousness that is important to consider is what we could call “unity.” On the lowest level of the unity scale, there is no sense of unity, but rather a sense of extreme isolation or individuality. At the highest level of the scale there is a sense of total unification of everything within one field of consciousness. That highest-level corresponds to what we could call “omniscience.” TheBuddhist concept of spiritual enlightenment is essentially consciousness that has evolved to BOTH the highest level of resolution and the highest level of unity.

The global superorganism is already conscious, in my opinion, but it has not achieved very high resolution or unity. This is because most humans, and most human groups and organizations, have only been able to achive the most basic levels of consciousness themselves. Since humans, and groups of humans, comprise the consciousness of the global superorganism, our individual and collective conscious evolution is directly related to the conscious evolution of the superorganism as a whole. This is why it is important for individuals and groups to work on their own consciousnesses. Consciousness is “there” as a basic property of the physical substrate, but like mass or energy, it can be channelled and accumulated and shaped. Currently the consciousness that is present in us as individuals, and in groups of us, is at best, nascent and underdeveloped.

In our young, dualistic, materialistic, and externally-obsessed civilization, we have made very little progress on working with consciousness. Instead we have focused most or all of our energy on working with certain other more material-seeming aspects of the substrate — space, time and energy. In my opinion a civilizationbecomes fully mature when it spends equal if not more time on the concsiousness dimension of the substrate. That is something we are just beginning to work on, thanks to the strangeness of quantum mechanics breaking our classical physical paradims and forcing us to admit that consciousness might play a role in our reality.

But there are ways to speed up the evolution of individual and collective consciousness, and in doing so we can advance our civilization as a whole. I have lately been writing and speaking about this in more detail.

On an individual level one way to rapidly develop our own consciousness is the path of meditation and spirituality — this is most important and effective. There may also be technological improvements, such as augmented reality, or sensory augmentation, that can improve how we perceive, and what we perceive. In the not too distant future we will probably have the opportunity to dramatically improve the range and resolution of our sense organs using computers or biological means. We may even develop new senses that we cannot imagine yet. In addition, using the Internet for example, we will be able to be aware of more things at once than ever before. But ultimately, the scope of our individual consciousness has to develop on an internal level in order to truly reach higher levels of resolution and unity.Machine augmentation can help perhaps, but it is not a substitute for actually increasing the capacity of our consciousnesses. For example, if we use machines to get access to vastly more data, but our consciousnesses remain at a relatively low-capacity level, we may not be able to integrate or make use of all that new data anyway.

It is a well known fact that the brain filters out most of the information we actually percieve. Furthermore when taking a a hallucinogenic drug, the filter opens up a little wider, and people become aware of things which were there all along but which they previously filtered out. Widening the scope of consciousness — increasing the resolution and unity of consciousness, is akin to what happens when taking such a drug, except that it is not a temporary effect and it is more controllable and functional on a day-to-day basis. Many great Tibetan lamas I know seem to have accomplished this — the scope of their consciousness is quite vast, and the resolution is quite precise. They literally can and do see every detail of eventhe smallest things, and at the same time they have very little or no sense of individuality. The lack of individuality seems to remove certain barriers which in turn enable them to perceive things that happen beyond the scope of what would normally be considered their own minds — for example they may be able to perceive the thoughts of others, or see what is happening in other places or times. This seems to take place because they have increased the resolution and unity oftheir consciousnesses.

On a collective level, there are also things we can do to make groups, organizations and communities more conscious. In particular, we can build systems that do for groups what the “self construct” does for individuals.

The self is an illusion. And that’s good news. If it wasn’t an illusion we could never see through it and so for one thing spiritual enlightenment would not be possible to achieve. Furthermore, if it wasn’t an illusion we could never hope to synthesize it for machines, or for large collectives. The fact that “self” is an illusion is something that Buddhist, neuroscientists, and cognitive scientists all seem to agree on. The self is an illusion, a mere mental construct. But it’s a very useful one, when applied in the right way. Without some concept of self we humans would find it difficult to communicate or even navigate down the street. Similarly, without some concept of self groups, organizations and communities also cannot function very productively.

The self construct provides an entity with a model of itself, and its environment. This model includes what is taking place “inside” and what is taking place “outside” what is considered to be self or “me.” By creating this artificial boundary, and modelling what is taking place on both sides of the boundary, the self construct is able to measure and plan behavior, and to enable a system to adjust and adaptto “itself” and the external environment. Entities that have a self construct are able to behave far more intelligently than those which do not. For example, consider the difference between the intelligence of a dog and that of a human. Much of this is really a difference in the sophistication of the self-constructs of these two different species. Human selves are far more self-aware, introspective, and sophisticatedthan that of dogs. They are equally conscious, but humans have more developed self-constructs. This applies to simple AI programs as well, and to collective intelligences such as workgroups, enterprises, and online communities. The more sophisticated the self-construct, the smarter the system can be.

The key to appropriate and effective application of the self-construct is to develop a healthy self, rather than to eliminate the self entirely. Eradication of the self is form of nihilism that leads to an inability to function in the world. That is not somethingthat Buddhist or neuroscientists advocate. So what is a healthy self? In an individual, a healthy self is a construct that accurately represents past, present and projected future internal and external state, and that is highly self-aware, rational but not overly so, adaptable, respectful of external systems and other beings, and open to learning and changing to fit new situations. The same is true for a healthy collective self. However, most individuals today do not have healthy selves — they have highly delluded, unhealthy self-constructs. This in turn is reflected in the higher-order self-constructs of the groups, organizations and communities we build.

One of the most important things we can work on now is creating systems that provide collectives — groups, organizations and communities — with sophisticated, healthy, virtual selves. These virtual selves provide collectives with a mirror of themselves. Having a mirror enables the members of those systems to see the whole, and how they fit in. Once they can see this they can then begin to adjust their own behavior to fit what the whole is trying to do. This simplemirroring function can catalyze dramatic new levels of self-organization and synchrony in what would otherwise be a totally chaotic “crowd” of individual entities.

In fact, I think that collectives move through three levels of development:

  • Level 1: Crowds. Crowds are collectives in which the individuals are not aware of the whole and in which there is no unified sense of identity or purpose. Nevertheless crowds do intelligent things. Consider for example, schools of fish, or flocks of birds. There is no single leader, yet the individuals, by adapting to what their nearby neighbors are doing, behave collectively as a single entity of sorts. Crowds are amoebic entities that ooze around in a bloblike fashion. They are not that different from physical models of gasses.
  • Level 2: Groups. Groups are the next step up from crowds. Groups have some form of structure, which usually includes a system for command and control. They are more organized. Groups are capable of much more directed and intelligent behaviors. Families, cities, workgroups, sports teams, armies, universities, corporations, and nations are examples of groups. Most groups have intelligences that are roughly similar to that of simple animals. Theymay have a primitive sense of identity and self, and on the basis of that, they are capable of planning and acting in a more coordinated fashion.
  • Level 3: Meta-Individuals. The highest level of collective intelligence is the meta-individual. This emerges when what was once a crowd of separate individuals, evolves to become a new individual in its own right, and is faciliated by the formation of a sophisticated meta-level self-construct for the collective. This evolutionary leap is called a metasystem transition — the parts join together to form a new higher-order whole that is made of the parts themselves. This new whole resembles the parts, but transcends theirabilities. To evolve a collective to the level of being a true individual, it has to have a well-designed nervous system, it has to have a collective brain and mind, and most importantly it has to achieve a high-level of collective consciousness. High level collective consciousness requires a sophisticated collective self construct to serve as a catalyst. Fortunately, this is something we can actually build, because as has been asserted previously, self is an illusion, a consturct, and therefore selves can be built, even for large collectives comprised of millions or billions of members.

The global superorganism has been called The Global Brain for over a century by a stream of forward looking thinkers. Today we may start calling it the One Machine, or the OM, or something else. But in any event, I think the most important work that we can can do to make it smarter is to provide it with a more developed and accurate sense of collective self. To do this we might start by working on ways toprovide smaller collectives with better selves — for example, groups, teams, enterprises and online communities. Can we provide them with dashboards and systems which catalyze greater collective awareness and self-organization? I really believe this is possible, and I am certain there are technological advances that can support this goal. That is what I’m working on with my own project, Twine.com. But this is just the beginning.