Formalizing RO-Crate in First Order Logic

Below is a formalization of the concept of RO-Crate as a set of relations using First Order Logic:


Definition of language π•ƒπ–—π–”π–ˆπ–—π–†π–™π–Š:

π•ƒπ–—π–”π–ˆπ–—π–†π–™π–Š = { Property(p), Class(c), Value(x), ℝ, π•Š }
     𝔻 =  π•€π•£π•š
    π•€π•£π•š ≑  { IRIs as defined in RFC3987 }
     ℝ ≑  { real or integer numbers }
     π•Š ≑  { literal strings }

The domain of discourse is the set of π•€π•£π•š identifiers [42] (notation <>)1, with additional descriptions using numbers ℝ (notation 13.37) and literal strings π•Š (notation β€œHello”).

From this formalised language π•ƒπ–—π–”π–ˆπ–—π–†π–™π–Š we can interpret an RO-Crate in any representation that can gather these descriptions, their properties, classes, and literal attributes.

Minimal RO-Crate

Below we use π•ƒπ–—π–”π–ˆπ–—π–†π–™π–Š to define a minimal2 RO-Crate:

                ROCrate(R) ⊨  Root(R) ∧ Mentions(R, R) ∧ hasPart(R, d) ∧ 
                               Mentions(R, d) ∧ DataEntity(d) ∧
                               Mentions(R, c) ∧ ContextualEntity(c)
               βˆ€r Root(r) β‡’  Dataset(r) ∧ name(r, n) ∧ 
                               description(r, d) ∧ 
                               datePublished(r, date) ∧
                               license(e, l)
          βˆ€eβˆ€n name(e, n) β‡’  Value(n)
   βˆ€eβˆ€s description(e, s) β‡’  Value(s)
 βˆ€eβˆ€d datePublished(e, d) β‡’  Value(d)
       βˆ€eβˆ€l license(e, l) β‡’  ContextualEntity(l)
             DataEntity(e) ≑  File(e) βŠ• Dataset(e)
                 Entity(e) ≑  DataEntity(e) ∨ ContextualEntity(e)
              βˆ€e Entity(e) β‡’ type(e, c) ∧ Class(c)
    βˆ€e ContextualEntity(e) β‡’ name(e, n)
            Mentions(R, s) ⊨  Relation(s, p, e)  βŠ•  Attribute(s, p, l)
         Relation(s, p, o) ⊨  Entity(s) ∧ Property(p) ∧ Entity(o)
        Attribute(s, p, x) ⊨  Entity(s) ∧ Property(p) ∧ Value(x)
                  Value(x) ≑  x ∈ ℝ  βŠ•  x ∈ π•Š

An ROCrate(R) is defined as a self-described Root Data Entity, which describes and contains parts (data entities), which are further described in contextual entities. These terms align with their use in the RO-Crate 1.1 terminology.

The Root(r) is a type of Dataset(r), and must as metadata have at least the attributes name, description and datePublished, as well as a contextual entity that identify its license. These predicates correspond to the RO-Crate 1.1 minimal requirements for the root data entity.

The concept of an Entity(e) is introduced as being either a DataEntity(e), a ContextualEntity(e), or both. Any Entity(e) must be typed with at least one Class(c), and every ContextualEntity(e) must also have a name(e,n); this corresponding to expectations for any referenced contextual entity (see section on contextual entities).

For simplicity in this formalization (and to assist production rules below) R is a constant representing a single RO-Crate, typically written to independent RO-Crate Metadata files. R is used by Mentions(R, e) to indicate that e is an Entity described by the RO-Crate and therefore its metadata (a set of Relation and Attribute predicates) form part of the RO-Crate serialization. Relation(s, p, o) and Attribute(s, p, x) are defined as a subject-predicate-object triple pattern from an Entity(s) using a Property(p) to either another Entity(o) or a Literal(x) value.

Example of formalised RO-Crate

The below is an example RO-Crate represented using the above formalization, assuming a base IRI of

    β€œData files associated with the manuscript:Effects of …”)
    β€œPalliative care planning for nursing home residents …")
datePublished(<>, β€œ2017")
hasPart(<>, <>)
hasPart(<>, <>)

    β€œCreative Commons Attribution 4.0”)

    β€œCreative Commons Attribution Non Commercial 4.0”)

name(<>, β€œSurvey of care providers”)

    β€œAudio recordings of care provider interviews”)

Notable from this triple-like formalization is that a RO-Crate R is fully represented as a tree at depth 2 helped by the use of π•€π•£π•š nodes. For instance the aggregation from the root entity hasPart(…interviews/>) is at same level as the data entity’s property license(…CC-BY-NC-4.0>) and that contextual entity’s attribute name (…Non Commercial 4.0”). As shown in section RO-Crate JSON-LD, the RO-Crate Metadata File serialization is an equivalent shallow tree, although at depth 3 to cater for the JSON-LD preamble of "@context" and "@graph".

In reality many additional attributes and contextual types from types like and would be used to further describe the RO-Crate and its entities, but as these are optional (SHOULD requirements) they do not form part of this formalization.

Mapping to RDF with

A formalised RO-Crate can be mapped to different serializations. Assume a simplified3 language π•ƒΚ€α΄…κœ° based on the RDF abstract syntax [98]:

                𝕃𝖗𝖉𝖋 = { Triple(s,p,o), IRI(i), BlankNode(b), Literal(s),
                         π•€π•£π•š, ℝ, π•Š }
                𝔻𝖗𝖉𝖋 = π•Š
           βˆ€i IRI(i) β‡’ i ∈ π•€π•£π•š
βˆ€sβˆ€pβˆ€o Triple(s,p,o) β‡’οΌˆ IRI(s) ∨ BlankNode(s) οΌ‰βˆ§
                        IRI(p) ∧
                      ( IRI(o) ∨ BlankNode(o) ∨ Literal(o) οΌ‰
          Literal(v) ⊨ Value(v) ∧ Datatype(v,t) ∧ IRI(t)
         βˆ€v Value(v) β‡’ v ∈ π•Š
    LanguageTag(v,l) ≑ Datatype(v,

Below follows a mapping from π•ƒπ–—π–”π–ˆπ–—π–†π–™π–Š to 𝕃𝖗𝖉𝖋 using

        Property(p) β‡’ type(p,
           Class(c) β‡’ type(c,
         Dataset(d) β‡’ type(d, <>)
            File(f) β‡’ type(f, <>)
ContextualEntity(e) β‡’ type(e, <>)
    CreativeWork(e) β‡’ ContextualEntity(e) ∧
                        type(e, <>)
      hasPart(e, t) β‡’ Relation(e, <>, t)
         name(e, n) β‡’ Attribute(e, <>, n)
  description(e, s) β‡’ Attribute(e, <>, s)
datePublished(e, d) β‡’ Attribute(e, <>, d)
      license(e, l) β‡’ Relation(e, <>, l) ∧
         type(e, t) β‡’ Relation(e,
             <>, t) ∧
          String(s) ≑ Value(s) ∧  s ∈ π•Š
          String(s) β‡’ Datatype(s, 
         Decimal(d) ≑ Value(d) ∧  d ∈ ℝ
         Decimal(d) β‡’ Datatype(d,
    Relation(s,p,o) β‡’ Triple(s,p,o) ∧ IRI(s) ∧ IRI(o)
   Attribute(s,p,o) β‡’ Triple(s,p,o) ∧ IRI(s) ∧ Literal(o)

Note that in the JSON-LD serialization of RO-Crate the expression of Class and Property is typically indirect: The JSON-LD @context maps to IRIs, which, when resolved as Linked Data, embeds their formal definition as RDFa. Extensions may however include such term definitions directly in the RO-Crate.

RO-Crate 1.1 Metadata File Descriptor

An important RO-Crate principle is that of being self-described. Therefore the serialization of the RO-Crate into a file should also describe itself in a Metadata File Descriptor, indicating it is about (describing) the RO-Crate root data entity, and that it conformsTo a particular version of the RO-Crate specification:

               about(s,o) β‡’  Relation(s, <>, o)
          conformsTo(s,o) β‡’  Relation(s, 
                               <>, R)
MetadataFileDescriptor(m) β‡’ ( CreativeWork(m) ∧ about(m,R) ∧ ROCrate(R) ∧ 
                               <>) οΌ‰

Note that although the metadata file necessarily is an information resource written to disk or served over the network (as JSON-LD), it is not considered to be a contained part of the RO-Crate in the form of a data entity, rather it is described only as a contextual entity.

In the conceptual model the RO-Crate Metadata File can be seen as the top-level node that describes the RO-Crate Root, however in the formal model (and the JSON-LD format) the metadata file descriptor is an additional contextual entity that is not affecting the depth-limit of the RO-Crate.

Forward-chained Production Rules for JSON-LD

Combining the above predicates and mapping with rudimentary JSON templates, these forward-chaining production rules can output JSON-LD according to the RO-Crate 1.1 specification4:

 Mentions(R, s) ∧ Relation(s, p, o) β‡’  Mentions(R, o)
                             IRI(i) β‡’ "i"
                         Decimal(d) β‡’  d
                          String(s) β‡’ "s"
                     βˆ€eβˆ€t type(e,t) β‡’  { "@id": s,
                                         "@type": t }
             βˆ€sβˆ€pβˆ€o Relation(s,p,o) β‡’  { "@id": s,
                                         p: { "@id": o }
            βˆ€sβˆ€pβˆ€v Attribute(s,p,v) β‡’  { "@id": s,
                                         p: v 
                   βˆ€rβˆ€c  ROCrate(R) β‡’  { "@graph": [ 
                                           Mentions(r, c)* 
                                  R ⊨  <./>
                                  R β‡’ MetadataFileDescriptor(

This exposes the first order logic domain of discourse of IRIs, with rational numbers and strings as their corresponding JSON-LD representation. These production rules first grow the graph of R by adding a transitive rule that anything described in R which is related to o means that o is also considered mentioned by the RO-Crate R. For simplicity this rule is one-way; in theory the JSON-LD graph can also contain free-standing contextual entities that have outgoing relations to data- and contextual entities, but these are proposed to be bound to the root data entity with relation

This is an appendix to the paper Packaging research artefacts with RO-Crate by Stian Soiland-Reyes, Peter Sefton, Mercè Crosas, Leyla Jael Castro, Frederik Coppens, José M. FernÑndez, Daniel Garijo, Bjârn Grüning, Marco La Rosa, Simone Leo, Eoghan Ó CarragÑin, Marc Portier, Ana Trisovic, RO-Crate Community, Paul Groth, Carole Goble.

  1. For simplicity, blank nodes are not included in this formalization, as RO-Crate recommends the use of IRI identifiers ↩︎

  2. The full list of types, relations and attribute properties from the RO-Crate specification are not included. Examples shown include datePublished, CreativeWork and name↩︎

  3. This simplification and mapping does not cover the extensive list of literal datatypes built into RDF 1.1, only strings and decimal real numbers. Likewise, LanguageTag is deliberately not utillised below. ↩︎

  4. Limitations: Contextual entities not related from the RO-Crate (e.g. using inverse relations to a data entity) would not be covered by the single direction Mentions(R, s) production rule; see issue 122. The datePublished(e, d) rule do not include syntax checks for the ISO 8601 datetime format. Compared with RO-Crate examples, this generated JSON-LD does not use a @context as the IRIs are produced unshortened, a post-step could do JSON-LD Flattening with a versioned RO-Crate context. The @type expansion is included for clarity, even though this is also implied by the type(e, t) expansion to Relation(e, xsd:type)↩︎