Tuesday, January 26, 2010

RDFa Patterns for Ancient World References

I am continuing to experiment with semantic links within digital publications relevant to the Ancient World. Here's a snippet from the same article I drew from in the last post.
In 124, Polemon had spoken before Hadrian and persuaded him to make a gift of money and grant a series of honors to Smyrna, not least of which was a second temple to the imperial cult (IvS 697; Burrell 2004: 42-48).
The "things" I want to identify are:
  • The year 124 as an event.
  • The sophist Polemon
  • The emperor Hadrian
  • The imperial cult
  • And the two citations
And I want to do this in a standards-based way that is automatically recognizable by third-parties (or at least their software agents).

As before, I'm using RDFa. In a future post, I'll explain this choice and talk about what RDFa and RDF are, but for now I'm diving right in.

The relevant namespaces that I'm using are:
  • xmlns:dbpedia="http://dbpedia.org/resource/"
  • xmlns:cito="http://purl.org/net/cito/"
  • xmlns:ev="http://purl.org/rss/1.0/modules/event/"
  • xmlns:ex="http://example.org/"
  • xmlns:foaf="http://xmlns.com/foaf/0.1/"
  • xmlns:frbr="http://purl.org/vocab/frbr/core#"
  • xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
  • xmlns:owl="http ://www.w3.org/2002/07/owl#"
  • xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  • xmlns:skos="http://www.w3.org/2008/05/skos#"
  • xmlns:xsd="http://www.w3.org/2001/XMLSchema"
All the markup that follows is experimental and comments are welcome, of course.

Polemon
The reference to Polemon now looks like:
<span id="id2209"
about="#id2209"
typeof="skos:Concept foaf:Person"
resource="[dbpedia:Polemon_of_Laodicea]"
rel="owl:sameAs cite"
property="rdfs:label">Polemon</span>


With the '<head>' of the document including '<base href="http://example.org/ajn2006-smyrna.html"/>', that RDFa gives the following RDF/turtle:

<http://example.org/ajn2006-smyrna.html#id2209>
owl:sameAs dbpedia:Polemon_of_Laodicea ;
a skos:Concept, foaf:Person ;
<http://www.w3.org/1999/xhtml/vocab#cite> dbpedia:Polemon_of_Laodicea ;
rdfs:label "Polemon"@en .
Some observations:
The pairing of 'id' and 'about' attributes means that I can identify a span of text and then say things about it.

I then give that span a type. Here I say that it's a skos:Concept and a foaf:Person. Which concept and which person? http://dbpedia.org/resource/Polemon_of_Laodicea. 'skos:Concept' will be used on all named-entities, and their nature will be further qualified when it's useful.

Why "owl:sameAs'. Here I follow the usage of dbpedia.org. If you look at the Polemon page, you'll see the same construct used to make the link to freebase. 'owl:sameAs' also underlies sameas.org (see the n3 for Hadrian).

The metaphor here is that I am instantiating Poleman as a concept and person present in the text. That should be recognizable and actionable. There is some redundancy in how I go about doing it, but that is in the spirit of convenience for future processors of this data.

"In 124"
This looks like:
<span id="id3724"
about="#id3724"
typeof="frbr:Event"
rel="owl:sameAs"
resource="dbpedia:124"
property="ev:startdate"
datatype="xsd:year"
content="124">In 124</span>
Same basic process. I isolate some text as individually addressable. I say what is, in this case a FRBR Event. Here I also embed a machine-readable property, the start date, into the document , but retain the inline text as the label.

But I am probably on less-firm ground here. I use FRBR because it's an LOC approved standard. I annotate the event with an RSS Event property and that's a little weak. And it might seem odd to equate the event with the dbpedia representation of the year 124. If you follow through to the wikipedia version, that does refer to Hadrian's trip east, which is the setting for Polemon's speech. In the case of a better known event, I think I'd prefer to link to a representation of that, for example http://dbpedia.org/page/Sack_of_Rome_(455). The 'owl:sameAs' on that page will eventually redirect you to the right Wiki page.

Here's the RDF/Turtle produced by the above RDFa:
<http://example.org/ajn2006-smyrna.html#id3724>
owl:sameAs <dbpedia:124> ;
ev:startdate "124"^^xsd:year ;
a frbr:Event, skos:Concept .
As above, the goal is for this to be usable in a number of contexts.

References
There are two inline references at the end of the sentence. The first is to a primary source, an inscription at Smyrna as published in Petzl, G. (1982). Die Inschriften von Smyrna. Bonn: Habelt. The second is to Barbara Burrell's Burrell, B. (2004). Neokoroi: Greek cities and Roman emperors. Cincinnati classical studies, new ser., v. 9. Leiden: Brill.

Here's the RDFa for the second:
<span id="id4616"
about="#id4616"
typeof="ex:Citation"
rel="cito:citesAsAuthority cite"
resource="http://www.worldcat.org/oclc/53013513"
property="rdfs:label">Burrell 2004: 42-48</span>
This is similar markup as previously, except I'm not instantiating it as a 'skos:Concept'. I am using the CITO ontology to indicate the relationship between the works, but note that I'm currently making up the type 'ex:Citation'. Perhaps I could use 'cito:Document' but that doesn't seem quite right. I really want to mark this span of text as being a citation but haven't found just the right RDF vocabulary. I looked at BIBO but, like CITO, it doesn't have the exact class I want. BIBO is linked with Zotero so I'd like to use it. For now, CITO has a more detailed set of relationships between citing and cited documents so I'm going with that. Worldcat also isn't great because there's confusion about the 'terms of use' but it will do for this experimental phase.

Here's the RDF/Turtle:
<http://example.org/ajn2006-smyrna.html#id4616>
cito:citesAsAuthority <http://www.worldcat.org/oclc/53013513> ;
a ex:Citation ;
<http://www.w3.org/1999/xhtml/vocab#cite> <http://www.worldcat.org/oclc/53013513> ;
rdfs:label "Burrell 2004: 42-48"@en .

The RDFa for the epigraphic reference looks like:
<span id="id9773"
about="#id9773"
typeof="ex:Citation"
rel="cito:citesAsAuthority ex:citesAsPrimarySource"
resource="http://www.worldcat.org/oclc/8935414"
property="rdfs:label"><i>IvS</i> 697</span>
The main difference here is that I'm also making up the 'ex:citesAsPrimarySource' value for the rel attribute. The concept of "Primary Source" and references thereto is important for the Humanities and we need a way of indicating its usage.

It's also important that I'm referring to the publication of the inscription, not the inscription itself. When a digital surrogate becomes available, I can point to that. In the meantime, a way of standardizing references to parts of a work would be useful. But I don't think you can just tag on a fragment identifier, as in http://www.worldcat.org/oclc/8935414#no.%20697, since the implication there is that such an ID actually exists. And it might be rude to put the same after a '?'. Something to ponder...


Instead of continuing on with each named entitiy, here's the whole sentence with RDFa visible:
<span id="id3724" about="#id3724" typeof="skos:Concept frbr:Event" rel="owl:sameAs" resource="dbpedia:124" property="ev:startdate" datatype="xsd:year" content="124">In 124</span>, <span id="id2209" about="#id2209" typeof="skos:Concept foaf:Person" resource="[dbpedia:Polemon_of_Laodicea]" rel="owl:sameAs cite" property="rdfs:label">Polemon</span> had spoken before <span id="id5130" about="#id5130" typeof="skos:Concept foaf:Person" rel="owl:sameAs cite" resource="[dbpedia:Hadrian]" property="rdfs:label">Hadrian</span> and persuaded him to make a gift of money and grant a series of honors to <span id="id39156" about="#id39156" typeof="skos:Concept geo:SpatialThing" rel="owl:sameAs cite" resource="http://pleiades.stoa.org/places/550771" property="rdfs:label">Smyrna</span>, not least of which was a second temple to the <span id="id4168" about="#4168" typeof="skos:Concept dbpedia:Religion" rel="owl:sameAs cite" resource="dbpedia:Imperial_cult_(ancient_Rome)]" property="rdfs:label">imperial cult</span> (<span id="id9773" about="#id9773" typeof="ex:Citation" rel="cito:citesAsAuthority ex:citesAsPrimarySource" resource="http://www.worldcat.org/oclc/8935414" property="rdfs:label"><i>IvS</i> 697</span>; <span id="id4616" about="#id4616" typeof="ex:Citation" rel="cito:citesAsAuthority cite" resource="http://www.worldcat.org/oclc/53013513" property="rdfs:label">Burrell 2004: 42-48</span>).
And here's the RDF/Turtle:

<http://example.org/ajn2006-smyrna.html#id3724>
owl:sameAs <dbpedia:124> ;
ev:startdate "124"^^xsd:year ;
a frbr:Event, skos:Concept .

<http://example.org/ajn2006-smyrna.html#id2209>
owl:sameAs dbpedia:Polemon_of_Laodicea ;
a skos:Concept, foaf:Person ;
<http://www.w3.org/1999/xhtml/vocab#cite> dbpedia:Polemon_of_Laodicea ;
rdfs:label "Polemon"@en .

<http://example.org/ajn2006-smyrna.html#id5130>
owl:sameAs dbpedia:Hadrian ;
a skos:Concept, foaf:Person ;
<http://www.w3.org/1999/xhtml/vocab#cite> dbpedia:Hadrian ;
rdfs:label "Hadrian"@en .

<http://example.org/ajn2006-smyrna.html#id39156>
owl:sameAs <http://pleiades.stoa.org/places/550771> ;
a geo:SpatialThing, skos:Concept ;
<http://www.w3.org/1999/xhtml/vocab#cite> <http://pleiades.stoa.org/places/550771> ;
rdfs:label "Smyrna"@en .

<http://example.org/ajn2006-smyrna.html#4168>
owl:sameAs <dbpedia:Imperial_cult_(ancient_Rome)]> ;
a dbpedia:Religion, skos:Concept ;
<http://www.w3.org/1999/xhtml/vocab#cite> <dbpedia:Imperial_cult_(ancient_Rome)]> ;
rdfs:label "imperial cult"@en .

<http://example.org/ajn2006-smyrna.html#id9773>
ex:citesAsPrimarySource <http://www.worldcat.org/oclc/8935414> ;
cito:citesAsAuthority <http://www.worldcat.org/oclc/8935414> ;
a ex:Citation ;
rdfs:label "<i>IvS</i> 697"^^rdf:XMLLiteral .

<http://example.org/ajn2006-smyrna.html#id4616>
cito:citesAsAuthority <http://www.worldcat.org/oclc/53013513> ;
a ex:Citation ;
<http://www.w3.org/1999/xhtml/vocab#cite> <http://www.worldcat.org/oclc/53013513> ;
rdfs:label "Burrell 2004: 42-48"@en .


Some of these constructs deserve more comment but this post is getting long. The only thing to add is that fairly soon I will publish a javascript toolset that starts making use of these patterns.

3 comments:

Michael Hausenblas said...

Very nice use case and great post! Just one thing to note: please don't 'reuse' the @id values in the @about as subjects unless you really want to state something about the HTML element (paragraph, etc.) at hand.

I tried to explain it in [1] and [2] - if something is unclear, please lemme know and I try to be more specific.

Cheers,
Michael

[1] http://ld2sd.deri.org/lod-ng-tutorial/#checklist-fragid
[2] http://www.w3.org/2001/sw/wiki/RDFa/FragmentIdentifiers

Sebastian Heath said...

Hi Michael,

Thanks for the comment. I do understand the issue. As described, the use of fragment identifiers was intentional. The issues probably go beyond a comment on an old post, but I too briefly summarize by asking: in a situation where xhtml+rdfa is the rdf serialization, do we have to worry about content negotiation/http issues? Also, though there might be some contortion involved, I do mean to be saying something about the xhtml itself: that it's the instantiation of the concept within the text and is related by owl:sameAs.

None of this is quite to disagree. Only to say that I think the issue perhaps becomes different when the html and rdf representation are combined via the mechanism of RDFa. Is that a content-negotiation free scenario. Particularly if the content isn't being delivered over http, but via a file system? Perhaps this is worth a post of its own.

-Sebastian

Michael Hausenblas said...

Sebastian,

I see. No, this is not really about conneg in the first place. This is an additional, orthogonal issue.

However, coming back to the core issue here - I can't imagine that you *really* want to state that, for example, a span element in your HTML document is of type foaf:Person, or? :)

Cheers,
Michael