How Bottlenose Works – A Glimpse Under the Hood

July 23rd, 2012

How Bottlenose Works

This article is the third article in a series of articles about the Bottlenose Public Beta launch.

Bottlenose – The Now Engine – The Web’s Collective Consciousness Just Got Smarter

The Bottlenose Business Concept

How Bottlenose Works – A Glimpse Under the Hood (you are here)

How Bottlenose Could Improve the Media and Enable Smarter Collective Intelligence

The StreamOS

Let’s talk about the technology behind Bottlenose. There’s a really interesting technical accomplishment under the hood, and it’s a really big part of the story.

To make sense of social media in real-time, and to measure collective consciousness right as it happens, we needed an incredibly powerful, incredibly fast, massively scalable system.

We’re doing something that has never been attempted before… real-time, big data analytics and discovery about every topic in the world.

As a small scrappy startup on a shoestring budget we could not afford thousands of servers and millions of dollars in IT infrastructure. We had to find another way to do this. And we did.

And it turns out it might even be the optimal solution. Natural selection gave us no other choice – we had to engineer around our circumstances.

So Dominiek (my Co-founder / CTO) and I took the brain as our guide… the brain is a massively distributed computing network. It’s not a central monolithic server farm, it’s a completely distributed peer-computing network.

That’s how we have architected Bottlenose. That’s how Bottlenose does all the computing it does so fast, and as you will see below… with so little infrastructure in the middle.

We call this approach a “synaptic web.”

Just like the brain, in the synaptic web, the intelligence is in the connections. A vast network of simple elements doing simple computations can be networked together to do amazingly powerful computations.

This is how Bottelnose works.

Under the hood we built a new engine for computing on streams, called the StreamOS, and we wrote it completely in javascript and HTML5.

 

 

The StreamOS does a lot of things, like real-time natural language processing and semantic classification, topic detection, analytics, sentiment analysis, trend detection, personalization, visualization, profiling, computation, storage, networking, messaging, and much more…

It’s all built from the ground up to deal with the messiness of the social web (for example messages that are missing vowels, have hashtags, abbreviations, @replies, shortURL, etc. — these don’t look like well-formed grammatical messages.)

But what’s most interesting about the StreamOS is that, after years of optimization, we have gotten the StreamOS engine down to the size of a photo. And it runs in a distributed fashion, in your browser, when you are at Bottlenose.

So when you are using Bottlenose, the whole experience is computed on your device, for you.

Everything you see is actually happening in your browser, on your computer… not on a server farm somewhere. It’s not really a website, it’s a distributed application.

This is truly Web 3.0 .. or what Roger McNamee calls “The Hypernet.”

Roger wrote a good article defining the Hypernet that I think is an important read – he says it very well.

Bottlenose is an example of the Hypernet… it’s a higher production-value experience that could only be possible with HTML5 and javascript. And it’s fully distributed, not a Website exactly. It acts sort of like a site, but it’s actually an app. And it doesn’t live in one central place, like a site, it’s distributed to the edges of the network.

Bottlenose effectively extends the cloud all the way to the edge of the network. It’s a giant grid computing system, and everyone can be part of it.

The Crowd is The Computer

In Bottlenose, the crowd IS the computer.

We call this “crowd computing.”

 

 

Bottlenose measures the crowd’s attention, and the crowd actually does the measuring.  It’s kind of like the brain actually. That’s how the brain works.

When you are at Bottlenose, your browser is like a neuron in a giant network that is helping to compute the collective consciousness of the net in real-time.

Your node only computes your experience for you, not for anyone else, and your data is completely private all the time, but if your node makes a public discovery like a new trend, those analytics get shared back and the whole network benefits.

So whereas Google, Facebook, Twitter and others have to build and maintain huge central server farms at great cost, we just compute everywhere in a fully distributed fashion at basically no cost. And it doesn’t make an impact on your device — this runs in a laptop, even on an ipad, with no trouble.

In fact the StreamOS is so powerful it can semantically analyze and process between 1500 and 3000 messages a second, per browser, on ordinary laptops. But that never happens because nobody ever gets that many messages – but it’s good to know we can. And sometimes we do use some of that capacity (for example when we recompute trends and find topics in all your messages and streams).

So, thanks to our patent-pending crowd computing infrastructure, today with almost 60K beta testers, we have almost 60K CPUs in our “crowd” … in other words, we have 60,000 virtual nodes in our datacenter.

Of course not all those 60K people are there concurrently, but those that are present, are helping to compute their perspective on the Web’s consciousness with us. It’s actually similar to a giant hologram, in which each node computes a different perspective on the whole.

Now this doesn’t mean we won’t also use the cloud. In fact we do. We do have a small number of actual servers for basic things like login and account creation and such. And we do have an elastic cloud and we can grow that massively especially for large customers who want analytics against the firehose.

But the key is we don’t have to until we need to. And we don’t have to incur those costs until someone is paying us. Yet, we can still provide services for free, at scale, that most companies could not afford to provide at all.

There’s a lot more I can say about our platform and architecture (like the the fact that any process can seamlessly move into cloud too using node.js, and it can scale up to enterprise levels or down to mobile devices, etc.). There’s a ton of innovation here, and we have 8 pending patents around what we figured out.

But the result of all this computing is our core IP, a vast real-time interest graph about every topic, link, person, company, event, brand, etc. that is being discussed on social networks. In some ways this is reminiscent of Applied Semantics, which was sold to Google (Gil Elbaz, the co-founder of Applied Semantics, is one of our investors in fact).

The interest graph that Bottlenose generates is a phenomenally rich, constantly evolving, model of the things and connections happening in the world right now. And it’s personalized too. All our products and services use this graph, and contribute to it.

The graph is the gold. Everyone computes a piece of it, and we put it all together to form a whole. It would be next to impossible to compute that centrally in real-time, or at least it would be extremely expensive. Our distributed approach enables us to “crawl” for this graph in a totally decentralized way, all the time, in real-time — with basically no central load.

There are numerous uses for this graph that we haven’t even had time to explore, but some might include better personalization, ad targeting, keyword discovery, knowledge discovery, collaboration, marketing analytics, alerting, and much more. We’re just beginning to think about the applications actually.

 

Beyond PageRank – Measuring Social Sharing

Danny Sullivan wrote a wonderful piece over the weekend about the social search problem, that calls for social ranking algorithms to overtake link-based popularity contests.

Precisely such an approach is what underlies how Bottlenose detects and measures trends. Bottlenose measures how topics, links, and people are talked about in the social web, and how the conversation moves.

StreamSenseTM is our answer to Google’s PageRank. It’s not for the Web, it’s for social messages — but it’s analogous — it measures what’s trending in relevant streams.

StreamSense is how we sense and decide which social media messages are most important right now, for a given topic or term, and within a given cohort or community. It’s an algorithmic approach to measuring what’s happening, what’s connected, and what’s valuable in real-time streams of social activity.

 

Want to Know More? Keep Reading

Bottlenose – The Now Engine – The Web’s Collective Consciousness Just Got Smarter

The Bottlenose Business Concept

How Bottlenose Works – A Glimpse Under the Hood (you are here)

How Bottlenose Could Improve the Media and Enable Smarter Collective Intelligence