Formalizing RO-Crate in First Order Logic
Below is a formalization of the concept of RO-Crate as a set of relations using First Order Logic:
Language
Definition of language ππππππππ
:
ππππππππ = { Property(p), Class(c), Value(x), β, π }
π» = ππ£π
ππ£π β‘ { IRIs as defined in RFC3987 }
β β‘ { real or integer numbers }
π β‘ { literal strings }
The domain of discourse is the set of ππ£π
identifiers [42] (notation <http://example.com/>
)1, with additional descriptions using numbers β
(notation 13.37
) and literal strings π
(notation βHelloβ
).
From this formalised language ππππππππ
we can interpret an RO-Crate in any representation that can gather these descriptions, their properties, classes, and literal attributes.
Minimal RO-Crate
Below we use ππππππππ
to define a minimal2 RO-Crate:
ROCrate(R) β¨ Root(R) β§ Mentions(R, R) β§ hasPart(R, d) β§
Mentions(R, d) β§ DataEntity(d) β§
Mentions(R, c) β§ ContextualEntity(c)
βr Root(r) β Dataset(r) β§ name(r, n) β§
description(r, d) β§
datePublished(r, date) β§
license(e, l)
βeβn name(e, n) β Value(n)
βeβs description(e, s) β Value(s)
βeβd datePublished(e, d) β Value(d)
βeβl license(e, l) β ContextualEntity(l)
DataEntity(e) β‘ File(e) β Dataset(e)
Entity(e) β‘ DataEntity(e) β¨ ContextualEntity(e)
βe Entity(e) β type(e, c) β§ Class(c)
βe ContextualEntity(e) β name(e, n)
Mentions(R, s) β¨ Relation(s, p, e) β Attribute(s, p, l)
Relation(s, p, o) β¨ Entity(s) β§ Property(p) β§ Entity(o)
Attribute(s, p, x) β¨ Entity(s) β§ Property(p) β§ Value(x)
Value(x) β‘ x β β β x β π
An ROCrate(R)
is defined as a self-described Root Data Entity, which describes and contains parts (data entities), which are further described in contextual entities. These terms align with their use in the RO-Crate 1.1 terminology.
The Root(r)
is a type of Dataset(r)
, and must as metadata have at least the attributes name
, description
and datePublished
, as well as a contextual entity that identify its license
. These predicates correspond to the RO-Crate 1.1 minimal requirements for the root data entity.
The concept of an Entity(e)
is introduced as being either a DataEntity(e)
, a ContextualEntity(e)
, or both. Any Entity(e)
must be typed with at least one Class(c)
, and every ContextualEntity(e)
must also have a name(e,n)
; this corresponding to expectations for any referenced contextual entity (see section on contextual entities).
For simplicity in this formalization (and to assist production rules below) R
is a constant representing a single RO-Crate, typically written to independent RO-Crate Metadata files. R
is used by Mentions(R, e)
to indicate that e
is an Entity described by the RO-Crate and therefore its metadata (a set of Relation and Attribute predicates) form part of the RO-Crate serialization. Relation(s, p, o)
and Attribute(s, p, x)
are defined as a subject-predicate-object triple pattern from an Entity(s)
using a Property(p)
to either another Entity(o)
or a Literal(x)
value.
Example of formalised RO-Crate
The below is an example RO-Crate represented using the above formalization, assuming a base IRI of http://example.com/ro/123/
:
RO-Crate(<http://example.com/ro/123/>)
name(<http://example.com/ro/123/,
βData files associated with the manuscript:Effects of β¦β)
description(<http://example.com/ro/123/,
βPalliative care planning for nursing home residents β¦")
license(<http://example.com/ro/123/>,
<https://spdx.org/licenses/CC-BY-4.0>
datePublished(<http://example.com/ro/123/>, β2017")
hasPart(<http://example.com/ro/123/>, <http://example.com/ro/123/survey.csv>)
hasPart(<http://example.com/ro/123/>, <http://example.com/ro/123/interviews/>)
ContextualEntity(<https://spdx.org/licenses/CC-BY-4.0>)
name(<https://spdx.org/licenses/CC-BY-4.0,
βCreative Commons Attribution 4.0β)
ContextualEntity(<https://spdx.org/licenses/CC-BY-NC-4.0>)
name(<https://spdx.org/licenses/CC-BY-NC-4.0,
βCreative Commons Attribution Non Commercial 4.0β)
File(<http://example.com/ro/123/survey.csv>)
name(<http://example.com/ro/123/survey.csv>, βSurvey of care providersβ)
Dataset(<http://example.com/ro/123/interviews/>)
name(<http://example.com/ro/123/interviews/>,
βAudio recordings of care provider interviewsβ)
license(<http://example.com/ro/123/interviews/>,
<https://spdx.org/licenses/CC-BY-NC-4.0>
Notable from this triple-like formalization is that a RO-Crate R is fully represented as a tree at depth 2 helped by the use of ππ£π
nodes. For instance the aggregation from the root entity hasPart(β¦interviews/>)
is at same level as the data entityβs property license(β¦CC-BY-NC-4.0>)
and that contextual entityβs attribute name (β¦Non Commercial 4.0β)
. As shown in section RO-Crate JSON-LD, the RO-Crate Metadata File serialization is an equivalent shallow tree, although at depth 3 to cater for the JSON-LD preamble of "@context"
and "@graph"
.
In reality many additional attributes and contextual types from Schema.org types like http://schema.org/affiliation and http://schema.org/Organization would be used to further describe the RO-Crate and its entities, but as these are optional (SHOULD requirements) they do not form part of this formalization.
Mapping to RDF with Schema.org
A formalised RO-Crate can be mapped to different serializations. Assume a simplified3 language πΚα΄
κ°
based on the RDF abstract syntax [98]:
ππππ = { Triple(s,p,o), IRI(i), BlankNode(b), Literal(s),
ππ£π, β, π }
π»πππ = π
βi IRI(i) β i β ππ£π
βsβpβo Triple(s,p,o) βοΌ IRI(s) β¨ BlankNode(s) οΌβ§
IRI(p) β§
οΌ IRI(o) β¨ BlankNode(o) β¨ Literal(o) οΌ
Literal(v) β¨ Value(v) β§ Datatype(v,t) β§ IRI(t)
βv Value(v) β v β π
LanguageTag(v,l) β‘ Datatype(v,
http://www.w3.org/1999/02/22-rdf-syntax-ns#langString)
Below follows a mapping from ππππππππ
to ππππ
using Schema.org.
Property(p) β type(p,
<http://www.w3.org/2000/01/rdf-schema#Property>)
Class(c) β type(c,
<http://www.w3.org/2000/01/rdf-schema#Class>)
Dataset(d) β type(d, <http://schema.org/Dataset>)
File(f) β type(f, <http://schema.org/MediaObject>)
ContextualEntity(e) β type(e, <http://schema.org/Thing>)
CreativeWork(e) β ContextualEntity(e) β§
type(e, <http://schema.org/CreativeWork>)
hasPart(e, t) β Relation(e, <http://schema.org/hasPart>, t)
name(e, n) β Attribute(e, <http://schema.org/name>, n)
description(e, s) β Attribute(e, <http://schema.org/description>, s)
datePublished(e, d) β Attribute(e, <http://schema.org/datePublished>, d)
license(e, l) β Relation(e, <http://schema.org/license>, l) β§
CreativeWork(l)
type(e, t) β Relation(e,
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, t) β§
Class(t)
String(s) β‘ Value(s) β§ s β π
String(s) β Datatype(s,
<http://www.w3.org/2001/XMLSchema#string>)
Decimal(d) β‘ Value(d) β§ d β β
Decimal(d) β Datatype(d,
<http://www.w3.org/2001/XMLSchema#decimal>)
Relation(s,p,o) β Triple(s,p,o) β§ IRI(s) β§ IRI(o)
Attribute(s,p,o) β Triple(s,p,o) β§ IRI(s) β§ Literal(o)
Note that in the JSON-LD serialization of RO-Crate the expression of Class
and Property
is typically indirect: The JSON-LD @context
maps to Schema.org IRIs, which, when resolved as Linked Data, embeds their formal definition as RDFa. Extensions may however include such term definitions directly in the RO-Crate.
RO-Crate 1.1 Metadata File Descriptor
An important RO-Crate principle is that of being self-described. Therefore the serialization of the RO-Crate into a file should also describe itself in a Metadata File Descriptor, indicating it is about
(describing) the RO-Crate root data entity, and that it conformsTo
a particular version of the RO-Crate specification:
about(s,o) β Relation(s, <http://schema.org/about>, o)
conformsTo(s,o) β Relation(s,
<http://purl.org/dc/terms/conformsTo>, R)
MetadataFileDescriptor(m) β οΌ CreativeWork(m) β§ about(m,R) β§ ROCrate(R) β§
conformsTo(m,
<https://w3id.org/ro/crate/1.1>) οΌ
Note that although the metadata file necessarily is an information resource written to disk or served over the network (as JSON-LD), it is not considered to be a contained part of the RO-Crate in the form of a data entity, rather it is described only as a contextual entity.
In the conceptual model the RO-Crate Metadata File can be seen as the top-level node that describes the RO-Crate Root, however in the formal model (and the JSON-LD format) the metadata file descriptor is an additional contextual entity that is not affecting the depth-limit of the RO-Crate.
Forward-chained Production Rules for JSON-LD
Combining the above predicates and Schema.org mapping with rudimentary JSON templates, these forward-chaining production rules can output JSON-LD according to the RO-Crate 1.1 specification4:
Mentions(R, s) β§ Relation(s, p, o) β Mentions(R, o)
IRI(i) β "i"
Decimal(d) β d
String(s) β "s"
βeβt type(e,t) β { "@id": s,
"@type": t }
}
βsβpβo Relation(s,p,o) β { "@id": s,
p: { "@id": o }
}
βsβpβv Attribute(s,p,v) β { "@id": s,
p: v
}
βrβc ROCrate(R) β { "@graph": [
Mentions(r, c)*
]
}
R β¨ <./>
R β MetadataFileDescriptor(
<ro-crate-metadata.json>)
This exposes the first order logic domain of discourse of IRIs, with rational numbers and strings as their corresponding JSON-LD representation. These production rules first grow the graph of R
by adding a transitive rule that anything described in R
which is related to o
means that o
is also considered mentioned by the RO-Crate R
. For simplicity this rule is one-way; in theory the JSON-LD graph can also contain free-standing contextual entities that have outgoing relations to data- and contextual entities, but these are proposed to be bound to the root data entity with Schema.org relation http://schema.org/mentions.
This is an appendix to the paper Packaging research artefacts with RO-Crate by Stian Soiland-Reyes, Peter Sefton, Mercè Crosas, Leyla Jael Castro, Frederik Coppens, José M. FernÑndez, Daniel Garijo, Bjârn Grüning, Marco La Rosa, Simone Leo, Eoghan à CarragÑin, Marc Portier, Ana Trisovic, RO-Crate Community, Paul Groth, Carole Goble.
-
For simplicity, blank nodes are not included in this formalization, as RO-Crate recommends the use of IRI identifiers ↩︎
-
The full list of types, relations and attribute properties from the RO-Crate specification are not included. Examples shown include
datePublished
,CreativeWork
andname
. ↩︎ -
This simplification and mapping does not cover the extensive list of literal datatypes built into RDF 1.1, only strings and decimal real numbers. Likewise,
LanguageTag
is deliberately not utillised below. ↩︎ -
Limitations: Contextual entities not related from the RO-Crate (e.g. using inverse relations to a data entity) would not be covered by the single direction
Mentions(R, s)
production rule; see issue 122. ThedatePublished(e, d)
rule do not include syntax checks for the ISO 8601 datetime format. Compared with RO-Crate examples, this generated JSON-LD does not use a@context
as the IRIs are produced unshortened, a post-step could do JSON-LD Flattening with a versioned RO-Crate context. The@type
expansion is included for clarity, even though this is also implied by thetype(e, t)
expansion toRelation(e, xsd:type)
. ↩︎