W3C PROV Implementations: Preliminary Analysis

By Khalid Belhajjame, syndicated from https://khalidbelhajjame.wordpress.com/2013/04/04/w3c-prov-implementations/

In the beginning of December 2012, the W3C Provenance Working Group issued a call for implementations. As of February the 25th 2013, 64 PROV implementations were reported to the W3C Provenance Working Group.

These implementations took different forms ranging from stand alone applications (30), to reusable frameworks and libraries (10), to services hosted by third parties (9), to vocabularies (21), and constraints validation modules (3).

The objective of this blog post is to examine how PROV is being used. In particular, I will identify the prov concepts that are commonly used, and attempt to give an explanation of the figures obtained. In my analysis, I will focus on the first three components of PROV, viz., component 1 (Entities and Activities), component 2 (Derivations) and component 3 (Agents, Responsibility, and Influence.

The chart in Figure 1 summarizes the usage of concepts PROV in implementations of type Application, Framework/API and Service. In total, 40 implementations fall in those categories. The chart distinguishes between the consumption and the generation of a given concept by implementation. For each concept, the chart shows the number of implementation that consume instances of that concept, produce them, or both consume and produce them.

Implementation types include Application, Framework / API, or Service

Coverage of PROV concepts in implementations

Implementation types include Application, Framework / API, or Service

The analysis of this chart shows that the concepts in the three components Entities/Activities, Agents, Responsibilities, Influence, and Derivations are covered by the implementations. However, the frequency by which those concepts are covered varies. In particular, we observe that a large proportion of implementations supports (most of the) core concepts of PROV.

PROV core concepts are illustrated in Figure 2. Specifically, the following core concepts: Entity, Activity, Agent, Usage and Generation, are supported by almost all implementations. Association and Derivation are supported by more than ¾ of the implementations.

An entity was derived from another entity, an was generated by an activity, which used the previous entity. The activity was associated with an agent (like a person), which the entity was attributed to. Reused from https://www.w3.org/TR/prov-dm/ under the terms of W3C Document License.
Copyright © 2011–2013 World Wide Web Consortium, (MIT, ERCIM, Keio, Beihang).

PROV core concepts

An entity was derived from another entity, an was generated by an activity, which used the previous entity. The activity was associated with an agent (like a person), which the entity was attributed to.
Reused from https://www.w3.org/TR/prov-dm/ under the terms of W3C Document License.
Copyright © 2011–2013 World Wide Web Consortium, (MIT, ERCIM, Keio, Beihang).

On the other hand, we observe that the core concepts of Attribution, Communication and Delegation are supported by less than half of the implementations. Specifically, 19 out of 40 implementations support Attribution, 14 support Delegation, and 12 support Communication.

In the case of Attribution and Communication, one can argue that they are actually (indirectly) supported by most of implementations. This is because Attribution can be inferred using a chain of Generation and Association, which are supported by most of implementations. Similarly, Communication can be inferred using a chain of Generation and Usage, which are supported by most of implementations.

We also observe that the number of implementations that support Plan, which is not part of the core concepts illustrated in Figure 2, is large. Half of the implementations support this concept. This can be explained by the fact that most implementers felt the need to link the provenance traces produced by their system to the recipe that was followed.

Coverage of PROV by Vocabularies that use PROV

Coverage of PROV by Vocabularies that extend PROV

Figure 3 and 4 illustrate PROV concepts that are used and extended, respectively, by implementations of type vocabulary. In total, the working group received 8 vocabularies that use PROV concepts, and 13 vocabularies that extend them.

The two charts confirms the observation made in the case of implementations of type applications, frameworks and Services. Most of PROV concepts seem to be used and extended by vocabularies. The frequency by which they are supported is different from one concept to another. In particular, (most of the) core concepts of PROV are supported by the majority of vocabularies.

It is worth underlining that PROV is still in the process of being adopted. The existing implementations that we analyzed in this blog post show how PROV constructs create a firm foundation for provenance interoperability.