Home

20010324 Four (or three) standards that will bootstrap the SW

Lots of fun things are happening right now. There are work being done in both direct SW areas and related. New standards, formats, syntaxes, ideas, etc. has begun (or always has) to emerge, and this is happening in all the layers of the SW. First, this looks like a "standard chaos", but then one realizes that this is natural and important. I believe that if we only could bootstrap the SW, things will fall into places smoothly. This will only be possible if the theory and techniques that build the SW could overcome the problems it faces in the bootstrap process. If it starts: it will never stop!

The world will always be filled with different and even conflicting standards. There is always a need of basic and global standards (e.g. URI) so that other standards could be built on top of these. This is also important for the SW to start.

I propose that these are the standards that will bootstrap the SW:

These will be explained pair wise below.

A standard triple format and the standard conversion language for triplets

Triples is the basic information entity on the SW. Personal robots will extract triples from annotated Web pages, B2B software will collect triplets from company databases, and multimedia-finding-robots will extract triples from images/audio/video resources. Naturally, these triplets will not often be expressed in the same format/syntax. In the beginning of the SW evolution, lots of as hoc solutions will be used to attach triplets to resources so one global serialization syntax will not be possible. The only thing in common is the data model (or the possibility to map it onto triples). When building an application (e.g. a robot) wouldn't it be great if you could design the application to accept all kinds of triple serialization syntaxes? As you know, this is not practically possible (you would have to implement conversion functions for all formats) but you don't what to isolate your application to only a small set of triple syntaxes, and what about future formats? What we need are two standards: a standard base triple representation (format) and a standard conversion language. This will enable applications to consume any triple syntax, but there are some requirements that need to be fulfilled. One problem with this is that how do you identify a triple serialization format? N3, RDF, or any other serialization has to be able to identify by *every* application that consumes triples. The only way to accomplish this is to include a triple-syntax-identifier before the triples. This identification should be a URL to a conversion instruction on how to convert this serialization syntax into the basic triple format written in the standard conversion language. These instructions should be "understood" by every application that needs to use a divers set of serialization syntaxes.

Requirements to achieve this:

If all these requirements are fulfilled and that "format producers" follow these rules, things would be great! Applications would be able to use triples written in any serialization syntax, and most importantly, applications would be able to consume future formats.

Partial understanding and conversion languages

Okay, now we can build applications that are syntax independent, but how about understanding? One of the most important things on the SW is partial understanding. Today, applications can communicate and exchange data only if the have a *total* agreement on syntaxes and semantics - and this is why thing is so messy! Why do we need e.g. to choose "save as" and then "word 6.0" to exchange a word document, when the text is mapped onto the format in almost the same way.

Back to the SW. The first thing to decide is: on what level should this partial understanding occur? At the schema, ontology, or logical level? Should a powerful heuristic AI monster archive partial understanding when exchanging triples with a "stupid" search engine? Yes, but the other way around? This is something that I won't go into here and now - another day perhaps - but from now on, suppose we have decided this. As you know, there are quite a few different types of schemas. Basically, all these different types of schemas overlap on some concepts, and are disjoint on others. (This is not something that I have looked into.) One could express in RDF(S) all the things you could express in XML Schemas (talking XML here), but not the other way around. But does this mean that an application that "understands" XML Schemas should excluded form the RDF world? No - not if we could reach partial understanding. Two things could achieve this partial understanding: a common ontology-serialization/schema-construct (don't know what to name it) and a conversion language. Every application that wants to be apart of this wonderful world of partial understanding needs to "understand" a common format/structure/model that expresses some level of semantics and a conversion language that could convert (sometimes partially) any schema language into that form. E.g. if I construct some RDF(S) and an application that only "understands" XML Schemas should be able partially understand the meaning of the RDF(S) by mapping parts of the RDF(S) to the basic standard representation and then perhaps to XML Schemas if needed. What is understood depends on how much semantic overlapping that is present.

Requirements to achieve this:

This will enable application to become somewhat "schema independent", and will use this to achieve partial understanding. Perhaps the two conversion languages could be combinedthat would make it three standards instead of four. (Returning to this later)

NOTE! Partial understanding could be achieved at many levels. In the above, I have biased the text to describe the lowest level of understanding: schemas.

Too tired to continue. Will return and finish up. Please send me comments! It's a product coming from brainstorms!

Home