Tuesday, June 1, 2010

References in Digital Publications

Modern scholarship relies on citation. It's efficient in that one work can incorporate the results of another without having to repeat it. It's also a requirement of our modern academic culture that if you use somebody's idea, you give that person credit. There's more to be said on both points but this post is more about mechanics than purpose. (Though see here for a recent discussion of purpose. [I fall into the camp of : if you want credit for your work, make it easy to identify and be generous in giving credit to others. If you don't need credit, that's OK but still give it.]).

Back to references. They come in many forms in print works. In pre-linked media, among the purposes of citation is to give future readers the information they need to physically acquire the referenced work. That is, you take the title of the book or journal, go to the library to find the volume, and then start reading.

It is one of the great glories of the Internet that this physical labor is no longer always necessary. The simple construct '<a href="http://sebastianheath.com/files/HeathS2010-DigitalResearch.pdf">I wrote this</a>' is rendered as 'I wrote this', so that a mere click takes you directly to the article.

That form of link is too simple to support modern scholarly practice. Citations of the form (Heath 2010) give a preliminary indication to the reader of who wrote a referenced work. Full information in footnotes further enriches the reading experience, but at the cost of possibly interrupting the flow of an argument, or depriving the reader of a collected bibliography at the end of a work. Choose your own preference, that's not my point here.

Instead, I am exploring specific patterns of markup that promote access to referenced works while also recording bibliographic metadata in a robust and sustainable fashion. Two needs, two solutions.

Here's some markup: Late Roman pottery is very visible in Aegean landscapes (<a rel="dcterms:references" href="http://hdl.handle.net/10.2972/hesp.76.4.743">Pettegrew 2007</a>).

If we momentarily ignore the question of whether or not Handles records are good stable URIs for bibliographic resources, the semantics of this html are clear: it represents a citation of the 2007 article by David Pettegrew, The Busy Countryside of Late Roman Corinth. (Note: it doesn't reference the html page describing that title)

The use of the term "dcterms:references" in the RDFa rel attribute follows from the Dublin Core's Guidelines for Encoding Bibliographic Citation Information in Dublin Core Metadata. In this context 'references' is a verb, not a plural noun.

That html will render as: "Late Roman pottery is very visible in Aegean landscapes (Pettegrew 2007)." Again, this is all pretty clear.

It's also worth noting that the 'a' element in html is a building-block of our search-engine enabled world. Scholarship should not fight that, but use it. As many have said, "you get this for free."

I do, however, want to pair this reference with bibliographic metadata. Here's where some more RDFa comes in.

'http://hdl.handle.net/10.2972/hesp.76.4.743' is a unique identifier for Pettegrew's article. This suggests the following snippet: <div about="http://hdl.handle.net/10.2972/hesp.76.4.743"><span property="dcterms:bibliographicCitation">Pettegrew, D. (2007). "The Busy Countryside of Late Roman Corinth: Interpreting Ceramic Data Produced by Regional Archaeological Surveys" In <i>Hesperia</i> 76.4: 743-784.</span></div>

These two snippets can be adapted and combined with a little more RDFa scaffolding:
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:dcterms="http://purl.org/dc/terms/" >
<body about="http://example.org/example_document">
<h1>My Text</h1>
<p>Late Roman pottery is very visible in Aegean landscapes (<a rel="dcterms:references" href="http://hdl.handle.net/10.2972/hesp.76.4.743">Pettegrew 2007</a></p>
<h1>References</h1>
<p about="http://hdl.handle.net/10.2972/hesp.76.4.743" property="dcterms:bibliographicCitation">Pettegrew, D. (2007). "The Busy Countryside of Late Roman Corinth: Interpreting Ceramic Data Produced by Regional Archaeological Surveys" In <i>Hesperia</i> 76.4: 743-784.</p>
</body>
</html>


Pointing an RDFa extractor at that html gives:
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <http://www.w3.org/1999/xhtml> .
@prefix dcterms: <http://purl.org/dc/terms/> .

<http://example.org/example_document>
   dcterms:references <http://hdl.handle.net/10.2972/hesp.76.4.743> .

<http://hdl.handle.net/10.2972/hesp.76.4.743>
   dcterms:bibliographicCitation "Pettegrew, D. (2007). \"The Busy Countryside of Late Roman Corinth: Interpreting Ceramic Data Produced by Regional Archaeological Surveys\" In <i xmlns=\"http://www.w3.org/1999/xhtml\" xmlns:dcterms=\"http://purl.org/dc/terms/\">Hesperia</i> 76.4: 743-784."^^rdf:XMLLiteral .


The shorter version of which is: example.org/example_document references Pettegrew 2007 and even knows something about it. There are lots of third-party tools that can find this information when it is encoded in this way. And I could enrich the 'bibliographicCitation' to include parsable information on author, title, date, etc. That's for another time.

I want to stress that I don't think this determines a particular citation style. Use footnotes if that's preferable. As long as the RDFa produces triples similar to the above, your information is useful. And some degree of run-time transformation is also possible, depending on the granularity of the markup.

2 comments:

sgillies said...

Speaking of footnotes, HTML5 will have no such specific element, but has some good guidance on implementing footnotes: http://www.w3.org/TR/html5/interactive-elements.html#footnotes.

I like the approach you propose. My simple markup tools, restructured text (Sphinx) and TinyMCE (Plone) aren't up to the task at present. Speaking of which, Dan Brickley has some thoughts about RDFa and TinyMCE at http://danbri.org/words/2009/12/23/507.

Eric Kansa said...

This is really useful.

I'm slowly starting to "Semanticize" some of Open Context, and one nice area to start would be to use this to annotate some of the bibliographic references we have.