One problem with the information on the Web toady is that there is no precision in the statements (that the information consists of). The information that is created is usually written to a certain audience, in a certain context, in a certain domain, and about a certain topic. All this makes the statements in the information precise, because the words that are used have a precise meaning regarding the special domain, context or audience. This information about the domain, context, and assumptions about audience is usually implicit, e.g. a book uses a book cover that encapsulates the information and label it with title, categorization etc. that archives this precision. Thus, information in a book has a certain amount of precision, achieved by the books encapsulation. But as soon as the information from a e.g. book is directly published on the Web its precision gets lost since the books physical encapsulation is not possible to represent on the Web. And, the hypertext format, "discovered" by Ted Nelson, makes it possible to read information in a non sequential way: therefore one cannot assume that the person that reads a digitalized book starts at the beginning and reads it though to the end. A user that uses a search engine will most probably jump right into a chapter without knowing the domain, the topic, or the intended audience. Therefore, it becomes impossible for a search-engine to diversify the meaning of words, or statements because it does not know the context or domain it was originally written in. Since we cant use physical things like a book-cover to achieve precision we need something else. So, how can we use the Web so that precision, in at least a certain aspect, is preserved? The answer is quite obvious: use URIs to identify important things, and more generally use metadata (as a form of book cover). As Tim Berners-Lee puts it: "everything of importance should have a URI" [Berners-Lee, TLK]. This does not only include documents, or parts of it, retrievable over the Internet, but also abstract things, or things not digitally representable such as humans, feelings, and properties etc. but most of all concepts. This might at first seem a bit odd: "Ugh? Should people have URIs?" To see that it is not so strange, let us look at an example.
Imagine that you are doing "ego searching", i.e. looking for information
about your self on the Web. How do you express that in a keyword search
engine? First you might use your first and last name, but as the list of
results excides tens of thousands of links to people that is not you, you
might add your city and country. But that has the effect that pages where
only your name is used will not appear. Now, it is quite obvious that if
people had a URI, a URN specifically e.g.
people:sv:19740507-2342, or simply a permanent mail address, the
search would simple be to enter my URI instead if the information used it to
make precise statements. Then all the results would be "good" answers. It
does not mean that my name Benny should be replaced by this URI in the actual
text, but there should be metadata that attaches the unique URI to the name.
We don't change the way the text appears to human readers, we just attach
(e.g. tag it) with URIs making the statement more precise.
I read an article by <Benny="http://purl.org/net/benny"/>
about ...
I read an article by Benny about ...
In the simplified example above, the first line is how the statement is actually represented at syntax level (how computers will read it), and the line below is how it will be presented to humans reading the document. The statement above has precision in one sense and that is who Benny is.
Since we can't use book covers to encapsulate information, we need to create an imaginary book cover. Attaching metadata to the information does this. A page that is snatched from a book should at least have some metadata about by whom, about what, the topic, etc. Meaning lies in the context and the context is generally also implicit in the information. The context that information was created in was partly the source to the imprecision due to the fact that the context was not represented along with the information. The solution of using URIs makes the thing that they identify context independent. A certain URI means the same thing regardless if you use it in a different country, or in a different domain, and this is the most important thing regarding URIs. To achieve a semantic structure is about putting information items on the Web in context to one other. The task is about describing what a given information item "mean" in a computer processable way.
The meaning of a document on the Web can be defined more precisely than an ordinary paper document [Berners-Lee, ME]. This stems from the fact that the Web (the URI-space) is a global namespace, wherein the meaning of information items becomes world wide and non-ambiguous if the information declarers which language it was written in and which concepts are used [Berners-Lee, 2000].
What this is all about is to change the way information is expressed on the Web so that the information becomes as worldwide as the Web. Creating "artificial" contexts does this and the demand that always declaring in which (artifact) language it is written in. (Remember that this is not about the information that humans should read, but the information that computers should read.)
[Berners-Lee, TLK] Berners-Lee T, 2000: XML
2000 Keynote Slides, URL: http://www.w3.org/2000/Talks/1206-xml2k-tbl/
[Berners-Lee, ME] Berners-Lee T, 2000: Meaning, URL:
http://www.w3.org/DesignIssues/Meaning.html
[Berners-Lee, 2000] Berners-Lee T, 2000: Weaving
the Web, Harper Business