The Problem with Google Base and Ning

There is a hidden problem with open databases such as Google Base and Ning — as presently designed — a problem that I have not seen any discussion of yet.

Briefly stated: As the number of unique data schemas created in such systems grows, the probability of applications that use those schemas breaking also grows (perhaps exponentially).

Here’s why:

Let’s say that Sue creates a new schema in Ning (or Google Base) for a "Person." They make an app that uses this record structure. Now Joe makes a calendar app that takes Sue’s Person record and connects it with his own unique "Event" record schema. Joe’s app relies on Sue’s Person schema to work. Next, Bob makes a To-Do list app that uses Joe’s Event schema and Sue’s Person Schema and pumps out "To-Do-Entry" records. Finally, Lisa creates a Project manager app that uses Sue’s Person schema, Joe’s Event schema, and Bob’s To-Do-Entry schema, to pump out "Project" records.

So we have a network of apps that rely on data schemas from other apps. Next, let’s say that Sue decides to change one of the attribute-value pairs in her Person schema — perhaps changing it to map to a string instead of an integer value. That 1 simple change has huge ripple effects. First it causes Joe’s app to break, which then causes Bob’s app to break, which causes Lisa’s app to break, etc. In other words, we have a chain reaction of broken apps.

As the number of unique schemas increases, the likelihood that a given schema will be modified in a given time frame also increases. At the extreme end of this curve, with large numbers of users, schemas and apps, the likelihood approaches 100% that at any given time some schema that is directly or indirectly required by a given app will have changed, causing that app to break. So in other words if such services are successful, apps within them will break ever more frequently, causing endless problems for developers.

The only solution from a developer perspective is either to submit to constantly fixing your apps as they break, or to simply not make use of data produced by other apps on the platform. In the latter case, developers can protect their apps from breaking by simply "reinventing the wheel" and creating their own schemas for every data structure they wish to use, but the tradeoff is that they then will not be making use of existing content from other apps. The problem with this choice is that, at least in the case of Ning, "re-mixing" of data between apps is the very value proposition of using such a system. Without this capability why even use such a system instead of running your own database on your own server? So clearly neither pole of this tradeoff is optimal from a developer standpoint.,

Systems such as Google Base and Ning present an N-squared integration challenge to developers. Every app has to be potentially continually re-integrated with up to every other app in the worst case. But even in the best case, they present unworkable challenges to developers because every app may have to be continually re-integrated with at least a few other apps.

This is the very problem that the Semantic Web was created to solve. The Semantic Web provides tools for data schema integration and interoperability. The base value of RDF and OWL is that they provide a means to define, publish and map between data schemas in an open way. So for example, application creators can map their unique schemas to centrally agreed upon ontologies enabling the best of both worlds: individual developer freedom and global standards.

Of course using ontologies isn’t a magic bullet — it simply pushes the problem to a higher level. If the ontologies are changed, then any apps that rely on them may break. But at least everyone can integrate their apps with one single ontology (or a few) instead of potentially millions of disparate schemas. From a developer standpoint this is a far more manageable problem.

You can read more about my thoughts on the evolution of a World Wide Database (WWDB) here.