Recording authorship, curation and digital creation with the PAV ontology

PAV [archived 2012-02-18, now GitHub pav-ontology/pav, archived] is a lightweight ontology for tracking Provenance, Authoring and Versioning. PAV supplies terms for distinguishing between the different roles of the agents contributing content in current web based systems: contributors, authors, curators and digital artifact creators

The ontology also provides terms for tracking provenance of digital entities that are published on the web and then accessed, transformed and consumed. PAV version 2.1.1 was released on 2013-03-27 [Google Code retired, archived 2016-06-08; now on GitHub], making PAV an extension of the W3C provenance ontology PROV-O, thus enabling interoperability between PAV and PROV-compliant tools such as ProvToolbox [archived].

Overview

Diagram overview of PAV: a resource has pav attributes like createdBy, contributedBy to Person agents. From the resource,createdWith, retrievedBy etc. go to Software agents, and providedBy to Organization agents. createdAt goes to a Location, and sourceAccessedAt, derivedFrom, retrievedFrom to another resource

Note: PAV does not define any classes, and the PAV properties do not put any explicit restrictions on their domain/ranges. Therefore the classes above, like “another resource”, are only for illustration of typical use. The diagram above does not show data properties attached to resources, like pav:createdOn.

Example

Here’s an example of using PAV:

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix pav: <http://purl.org/pav/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix : <http://example.com/blog#> .

<http://example.com/blog.html> 
   pav:createdBy :alice ;
   pav:createdWith :wordpress ;

   pav:importedFrom <http://example.com/data.csv> ;
   pav:importedBy :csv2html ;

   pav:authoredBy :bob ;
   pav:curatedBy :charlie ;

   pav:authoredOn "2012-12-24T15:15:15Z"^^xsd:dateTime ;
   pav:importedOn "2013-03-27T10:06:17Z"^^xsd:dateTime .


:alice foaf:name "Alice" .
:bob foaf:name "Bob" .
:charlie foaf:name "Charlie" .
:csv2html a prov:SoftwareAgent ;
    foaf:homepage <https://github.com/mrc/csv2html> .
:wordpress a prov:SoftwareAgent ;
    foaf:homepage <http://wordpress.org/> .

This example shows how the blog post http://example.com/blog.html was createdBy Alice. The blog post was createdWith the software Wordpress. The content of the blog was importedFrom http://example.com/data.csv (presumably a CSV file), and this was importedBy the script csv2html.

Although Alice is the creator (as she made the blog post), http://example.com/blog.html is authoredBy Bob, he made the original data and therefore also is the author of (the content of) the blog post.

The post was curatedBy Charlie, who perhaps edited the CSV (or HTML) to include the correct column headers. We also notice that the blog was importedOn March 2013, while the content was authoredOn December 2012. We don’t know when Charlie curated it, although this could have been provided with curatedOn.

Additional PAV properties allows specifying attributions like contributors, the provider in addition to other kind of sources, such as direct downloading, verification against source material and derivation when further refinements have been made. Data can be given a version number, indicate its lineage to a previous version, and indicate when a source was last updated.

The PAV approach

The goal of PAV is to provide a lightweight, straight forward way to give the essential information about authorship, provenance and versioning, and therefore these properties are described directly on the published resource.

As such, PAV does not define any classes or restrict domain/ranges, as all properties are applicable to any online resource.

This “flat” approach mean that it is easy to use and query PAV without a deep understanding of provenance models, but at a small cost that more complex relationships are not expressed.

For instance, pav:authoredBy allows multiple authors, but if there are multiple authors we won’t know who wrote what; or when they did so. Such details can be included alongside PAV using other PROV statements.

Combining PAV with other PROV extensions

Here’s an example combining PAV with another PROV extension, the Provenance Vocabulary.

@prefix pav: <http://purl.org/pav/> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix prv: <http://purl.org/net/provenance/core#> .
@prefix prvTypes: <http://purl.org/net/provenance/types#> .
@prefix http: <http://www.w3.org/2006/http#> .

<http://example.com/data.csv> a prv:DataItem, prov:Entity ;
    pav:retrievedFrom <http://example.org/originalData> ;
    prv:retrievedBy [ a prvTypes:HTTPBasedDataAccess ;
        prv:accessedResource <http://example.org/originalData> ;
        prvTypes:exchangedHTTPMessage [ a http:Request ;
            http:httpVersion "1.1" ;
            http:methodName "GET" ;
            http:requestURI "http://example.org/originalData" ;
            http:headers ( [ a http:MessageHeader ;
                             http:fieldName "Accept" ;
                             http:fieldValue "text/csv" ] )
        ]
    ] .

This example shows how http://example.com/data.csv was downloaded from http://example.org/originalData with HTTP 1.1 and requesting Accept: text/csv by content negotiation.

The PAV term pav:retrievedFrom gives the short-cut to the original data, while prv:retrievedBy gives details of the transport and content-negotiation used in the download.

By combining vocabularies in such an approach it is possible to query the provenance for PAV statements in order to get a general overview of the provenance, and then explore other PROV statements for more specific details, which structure might not be known in advance.

Using PAV

To use PAV, import the PAV ontology from the namespace http://purl.org/pav/ and read the PAV documentation. The PAV wiki pages [Google Code retired, archived 2012-02-18, see GitHub Wiki] contain details about the PAV versions [archived 2016-08-06, see GitHub Wiki]. If your ontology extends or uses PAV, you can use either:

owl:imports <http://purl.org/pav/>

for the latest version (which might at some point include additional properties), or

owl:imports <http://purl.org/pav/2.1>

for the latest patch version of 2.1 (ie. no new terms will be added later).

Extracting PROV-O statements

As PAV is meant as a lightweight ontology, the inferred PROV-O statements are not usually explicitly included. Any OWL or RDFS reasoner should be able to infer the PROV-O statements as long as the PAV ontology is imported from http://purl.org/pav/

As an example of PAV interoperability with PROV, we built a Taverna workflow which uses the OWL reasoner Pellet [archived 2014-02-07, now GitHub stardog-union/pellet, archived] to infer PROV statements, and then visualize this as SVG using the PROV toolbox. Here’s the diagram (as PNG) visualizing the PAV example from the beginning of this page:

pav-example