Monday, May 10, 2010

Concept and Document in the Ancient World Semantic Web

This post is really just me taking some notes on semantic web usage. Apologies if it's too discursive but I'm just at the gathering info stage right now.

Along with some colleagues, I've been thinking about the relationships between concrete action and scholarly intent that are inherent in the links we make when creating digital publications.

First some background. Here's a "test" sentence, along with its html.

Themistocles was born in Athens.
Or:
<a href="http://en.wikipedia.org/wiki/Themistocles">Themistocles</a> was born in <a href="http://en.wikipedia.org/wiki/Athens">Athens</a>.


http://en.wikipedia.org/wiki/Athens is a document found on the Internet. As used in our sentence, it is a placeholder for Athens - nebulously defined, I admit - as a concept. Asking the question, "What is the latitude and longitude of Athens?", focuses the issue. It is not useful to respond with the location(s) of the Wikipedia servers. We clearly want to know the location of the site in "the real world", or 37° 58′ 0″ N, 23° 43′ 0″ E.

Links point to documents, we often mean the underlying concept. Often this distinction doesn't matter. Sometimes it does, as in:

My source for the longitude and latitude of Athens is the Wikipedia article for Athens.

That sentence has the same link appearing two times, one meaning the concept, the other meaning the document. Wikipedia provides no mechanism for distinguishing between these meanings.

DBpedia does implement this distinction. But first, here's the intro sentence from the DBpedia website:
DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data.

In DBpedia, the following URLs are both valid:

  • http://dbpedia.org/resource/Athens
  • http://dbpedia.org/page/Athens

The first refers to the concept, the second is a specific document. This allows for the following useful HTML:
My source for the longitude and latitude of <a href="http://dbpedia.org/resource/Athens">Athens</a> is the <a href="http://dbpedia.org/page/Athens">DBpedia page</a>.

Looking at the DBpedia page http://dbpedia.org/page/Athens is useful because it gives a list of resources that are each related to dbpedia:Athens via owl:sameAs. These are:

Before looking at one of these, what is owl:sameAs? The OWL Web Ontology language is described here. Among the descriptions of owl:sameAs given there is that a "...typical use of sameAs would be to equate individuals defined in different documents to one another, as part of unifying two ontologies". So the DBpedia usage, which is paralleled in many other semantic web resources, is spot on.

The geonames.org reference is interesting. In part because the site has a discussion that explicitly addresses the difference between concept and document: http://www.geonames.org/ontology/. That page also has a link to a good blog post.

DBpedia follows the Geonames guidelines in using owl:sameAs to qualify its link to http://sws.geonames.org/264371/ , which is the Geonames URI for the concept "Athens". Clicking on that redirects you to the page http://www.geonames.org/264371/athens.html. Note the change of host to 'www.geonames.org' and the addition of 'athens.html'. The serial number remains the same.

Here is a screen grab of the "balloon" that is displayed next to the icon indicating the location of Athens.


There are two interesting links shown in this image: 'perma link' and 'semantic web rdf':

http://www.geonames.org/264371/athens.html is just the link to the page. http://sws.geonames.org/264371/about.rdf is an RDF document. It's worth looking at the source to see the attribute 'rdf:about="http://sws.geonames.org/264371/"'. URLs of the pattern 'http:...about.rdf' are documents. http://sws.geonames.org/264371/ is a concept.

Even with this soup of web addresses, there is a lot that Geonames is doing right. The only missed opportunity I see is no explicit indication in the "264371/athens.html" page of the concept address. There is the following: <link rel="alternate" type="application/rdf+xml" title="RDF Version" href="http://sws.geonames.org/264371/about.rdf" />'. This is a link to a document not a concept. And 'alternate' is too vague for me to know that I can parse that RDF to find its @about value.

It would be nice if there were somelthing like '<link rel="concept" type="application/rdf+xml" title="Concept URI" href="http://sws.geonames.org/264371" />'. I'm not too concerned with what's in @type so I left it as is. Bit 'concept' is not in anyway standard. I just made it up.

If this post has a point, that's it. Make it really easy for me to figure out which URI is for the concept, because that's the one I really want to use. Or maybe I should end with a question. Is there an unambiguous and widely-accepted convention for indicating the concept lying behind a document? If not, we need one.

1 comment:

Gabriel Bodard said...

Doesn't the #this or #me identifier turn a uri from a resource referent to a real world object/person referent?

(I'm trying to find the reference to this, but searching for "#this" using a common web search engine turns out not to be very helpful...)