Categories

Archive
Wf4Ever project Wf4Ever was a research object funded by EU Framework 7 to investigate how scientific workflows and their data could be better preserved for reproducibility, reuse and resiliance against workflow decay.
What exactly happened to LSID? What exactly happened to LSID? It was a technically sound approach it would seem and one whose failure we would do well to learn more from.
PROV released as W3C Recommendations The Provenance Working Group was chartered to develop a framework for interchanging provenance on the Web. The Working Group has now published the PROV Family of Documents as W3C Recommendations, along with corresponding supporting notes. You can find a complete list of the documents in the PROV Overview Note. PROV enables one to represent and interchange provenance information using widely available formats such as RDF and XML. In addition, it provides definitions for accessing provenance information, validating it, and mapping to Dublin Core.
W3C PROV Implementations: Preliminary Analysis By Khalid Belhajjame, syndicated from https://khalidbelhajjame.wordpress.com/2013/04/04/w3c-prov-implementations/ In the beginning of December 2012, the W3C Provenance Working Group issued a call for implementations. As of February the 25th 2013, 64 PROV implementations were reported to the W3C Provenance Working Group. These implementations took different forms ranging from stand alone applications (30), to reusable frameworks and libraries (10), to services hosted by third parties (9), to vocabularies (21), and constraints validation modules (3).
Recording authorship, curation and digital creation with the PAV ontology PAV is a lightweight ontology for tracking Provenance, Authoring and Versioning.  PAV supplies terms for distinguishing between the different roles of the agents contributing content in current web based systems: contributors, authors, curators and digital artifact creators. The ontology also provides terms for tracking provenance of digital entities that are published on the web and then accessed, transformed and consumed.
Tutorial on the W3C PROV family of specifications Posted by Khalid Belhajjame Provenance, a form of structured metadata designed to record the origin or source of information, can be instrumental in deciding whether information is to be trusted, how it can be integrated with other diverse information sources, and how to establish attribution of information to authors throughout its history. The PROV set of specifications, produced by the World Wide Web Consortium (W3C), is designed to promote the publication of provenance information on the Web, and offers a basis for interoperability across diverse provenance management systems.
What can provenance do for me? 2013-03-21 What can provenance do for me? from Stian Soiland-Reyes Also available on Slideshare, pdf and as pptx. The above presentation was originally given at the Metagenomics, metagenetics and Pylogenetic workflows for Ocean Sampling Day Workshop at Max Planck Institute for Marine Microbiology on 2013-03-21 by Stian Soiland-Reyes. Reuse allowed under the Creative Commons Attribution license 3.0.
Stian's web sites in ye olden days Archived personal websites These are slightly embarrassing archives from more innocent times, preserved so you can see why I didn’t become a Web designer. s11.no ≔ з11.ею (2020–) – The current Hugo-based static website, launched primarily as home for archived HTML pages. Based on ronu-hugo-theme by Deepak Karanth, see stain/s11.no. The alternative domain https://з11.ею/ is available to test UTF-8 support. s11.no (2015–2019) – Bootstrap-based landing page for Søiland Software, linking to whatever development Web services was running at the time.
Conference
Updating Linked Data practices for FAIR Digital Object principles Talk abstract presented at FAIR Digital Objects conference (FDO2022)
Creating lightweight FAIR Digital Objects with RO-Crate Poster abstract presented at FAIR Digital Objects conference (FDO2022)
Incrementally building FAIR Digital Objects with Specimen Data Refinery workflows Poster abstract presented at FAIR Digital Objects conference (FDO2022)
Enhancing RDM in Galaxy by integrating RO-Crate Poster abstract presented at FAIR Digital Objects conference (FDO2022)
ISO 23494: Biotechnology – Provenance Information Model for Biological Specimen and Data Poster abstract accepted for Provenance Week 2020
Practical webby FDOs with RO-Crate and FAIR Signposting Article submitted for proceedings of International FAIR Digital Objects Implementation Summit 2024
RO-Crate, a lightweight approach to Research Object data packaging Conference abstract presented at Workshop on Research Objects 2019 (RO2019)
Phd
Glossary Glossary of terms and acronyms used in my PhD thesis
Chapter 1: Introduction Introduction and overview of PhD thesis and its research questions
Chapter 2: Background In this chapter, we discuss the related work with respect to FAIR Digital Objects and Linked Data. We do so by looking through the lens of development of these technologies over time, including future directions.
Chapter 3: FAIR Digital Objects and Linked Data To investigate RQ1 this chapter evaluates both Linked Data and FAIR Digital Object (FDO) as ways to realize the FAIR principles.
Evaluating FAIR Digital Object and Linked Data as distributed object systems Journal article published in PeerJ Computer Science
Updating Linked Data practices for FAIR Digital Object principles Talk abstract presented at FAIR Digital Objects conference (FDO2022)
Chapter 4: RO-Crate This chapter introduces RO-Crate, a pragmatic method of packaging data alongside structured metadata that is inline with the FAIR principles. This has been implemented to investigate RQ2.
Packaging research artefacts with RO-Crate Journal article published in Data Science
Creating lightweight FAIR Digital Objects with RO-Crate Poster abstract presented at FAIR Digital Objects conference (FDO2022)
Formalizing RO-Crate in First Order Logic Appendix from Journal article published in Data Science
Chapter 5: Computational Workflows In order to investigate RQ3, and considering important parts of the FAIR principles include Reuse and provenance, this chapter examines in closer details how FAIR Digital Objects and RO-Crate can be used with Computational Workflows.
Making Canonical Workflow Building Blocks interoperable across workflow languages Journal article published in Data Intelligence
The Specimen Data Refinery: A canonical workflow framework and FAIR Digital Object approach to speeding up digital mobilisation of natural history collections Journal article published in Data Intelligence
Incrementally building FAIR Digital Objects with Specimen Data Refinery workflows Poster abstract presented at FAIR Digital Objects conference (FDO2022)
Recording provenance of workflow runs with RO-Crate Journal article published in PLOS One
Chapter 6: Discussion and conclusions Overall consideration of this thesis.
Discussion Discussion of findings from this thesis, relating them to emerging related work and future directions.
Conclusions Conclusions for the research questions raised in introduction.
References An aggregated list of references from the chapters of this PhD thesis.
Appendix A: Acknowledgements Acknowledgements of PhD
Appendix B: Contributions Contributions for each chapter of this thesis, and listing all the other contributors and their affiliations.
Appendix C: Supplements Supplementary publications from my PhD thesis
Practical provenance
PROV-N Cheat Sheet This is a quick “cheat sheet” for the PROV-N syntax.
Installing ProvToolbox on macOS ProvToolbox is a useful command line tool for validating and visualizing PROV documents, but unfortunately it can be a bit of a challenge to install on Windows and on macOS because of its dependency requirements. This post suggests three step-by-step methods of installing ProvToolbox on your Mac – you should follow the method you feel most comfortable with, but can try the other methods in case of problems. Table of content Overview of requirements Software packaging for macOS Conda Installing Graphviz and OpenJDK with Conda HomeBrew Installing Graphviz with HomeBrew Installing OpenJDK with HomeBrew Installing manually Installing AdoptOpenJDK manually Installing Graphviz manually Installing ProvToolbox Using ProvToolbox from VSCode Overview of requirements As of 2020-08, ProvToolbox 0.
Installing ProvToolbox in Windows While there are several tools available for validating and visualizing PROV, the ProvToolbox is perhaps the most useful for validating PROV-N syntax. However, the normal releases does not run in Windows due to a operating system restriction for command line and folder path length. We have suggested a fix, but while we wait for that, here we describe a patch build that should work on Windows. We also show how to install dependencies: Java for executing ProvToolbox, and Graphviz for visualization.
Attribution vs association A valid question when writing provenance in responsibility view and process view is. Should we attribute contributors from entities, isn’t that what the activities are showing? In this blog post we explore the different options. Specially with roles it may seem unnecessary to also declare wasAttributedTo statements. It is true that you can conclude from: wasAttributed(ex:entity, ex:agent) then there was some activity X such that: wasGeneratedBy(ex:entity, X) wasAssociatedWith(X, ex:entity) This conclusion follows from the constraint on agents and the definition of wasAttributedTo.
Multiple agents sharing roles Assuming the task of writing provenance for a student group exercise, consider the question: Do we need to assign everyone in the group a specific role since in our group we found that for many of the tasks, everyone worked together to complete it? MSc Student in Understanding Data and their Environment, University of Manchester, 2020 This blog post explores the different PROV patterns that could describe this scenario.
What are good PROV-N prefixes? In this blog post we explore the role of PROV-N prefixes and how to decide on a good namespace to use your own custom provenance terms. Most examples of PROV-N use example prefixes like: prefix ex <http://example.com/> prefix exg <http://example.org/government> These example domains are explicitly reserved globally for all kinds of examples and training material, and deliberately do not have any content, advertisement or affiliations. Assume you are writing the provenance of a student group exercise, should you be using the prefix/namespace ex and example.
Validating and visualising PROV This blog post gives a gentle PROV-N introdction and then explores tools for validating and visualising PROV. One of the advantages of W3C PROV having a common data model is that it can be serialized, or written out, in multiple file formats. The PROV family of W3C specifications describe mappings PROV-XML and PROV-O (which, being based on OWL2 itself has multiple serializations, for Linked Data including RDF formats Turtle and JSON-LD.
Tracking versions with PAV The PAV ontology specializes the W3C PROV-O standard to give a lightweight approach to recording details about a resource, giving its Provenance, Authorship and Versioning. Our paper on PAV explores all of these aspects in details. In this blog post we discuss Versioning as modelled by PAV, including their hierarchical organization. Version numbers {#versionnumbers} Semantic versioning {#semver} Making versions retrievable {#retrievable} Ordering previous versions {#ordering} Providing provenance for each version {#each-version} Related work {#relatedwork} PROV-O revisions {#provo} Qualified revisions {#qualified} DC Terms {#dcterms} schema.
PAV Ontology paper highly accessed Our recent paper about the PAV ontology has been classified as highly accessed by Journal of Biomedical Semantics, with more than 1097 views since it was published two months ago, with an Altmetric score of 12. The PAV ontology provides a lightweight approach to record typical Provenance, Authorship and Versioning information, and builds upon existing standards like PROV-O and DC Terms. Our previous Practical Provenance post gives a brief overview of PAV, but you might also want to explore these links for more details:
Resources that change state The PROV working group received a question from Mike: My understanding is that an entity referenced in a PROV bundle (e.g. via wasGeneratedBy) must be in the bundle…but I do not wish to duplicate entity definitions through out my bundles. My entities are long lived and will exist in multiple bundles. So lets say I have a resource for alarms which contains a list of all alarms my company monitors.
PROV released as W3C Recommendations The Provenance Working Group was chartered to develop a framework for interchanging provenance on the Web. The Working Group has now published the PROV Family of Documents as W3C Recommendations, along with corresponding supporting notes. You can find a complete list of the documents in the PROV Overview Note. PROV enables one to represent and interchange provenance information using widely available formats such as RDF and XML. In addition, it provides definitions for accessing provenance information, validating it, and mapping to Dublin Core.
Locating provenance for a RESTful web service This blog post shows how RESTful web services can provide, and link to, provenance data for their exposed resources by using the PROV-AQ mechanism of HTTP Link headers. This is demonstrated by showing how to update a hello world REST service implemented with Java and JAX-RS 2.0 to provide these links. The PROV-AQ HTTP mechanism is easiest explained by an example: GET http://example.com/resource.html HTTP/1.1 Accept: text/html HTTP/1.1 200 OK Content-type: text/html Link: <http://example.
W3C PROV Implementations: Preliminary Analysis By Khalid Belhajjame, syndicated from https://khalidbelhajjame.wordpress.com/2013/04/04/w3c-prov-implementations/ In the beginning of December 2012, the W3C Provenance Working Group issued a call for implementations. As of February the 25th 2013, 64 PROV implementations were reported to the W3C Provenance Working Group. These implementations took different forms ranging from stand alone applications (30), to reusable frameworks and libraries (10), to services hosted by third parties (9), to vocabularies (21), and constraints validation modules (3).
Recording authorship, curation and digital creation with the PAV ontology PAV is a lightweight ontology for tracking Provenance, Authoring and Versioning.  PAV supplies terms for distinguishing between the different roles of the agents contributing content in current web based systems: contributors, authors, curators and digital artifact creators. The ontology also provides terms for tracking provenance of digital entities that are published on the web and then accessed, transformed and consumed.
Tutorial on the W3C PROV family of specifications Posted by Khalid Belhajjame Provenance, a form of structured metadata designed to record the origin or source of information, can be instrumental in deciding whether information is to be trusted, how it can be integrated with other diverse information sources, and how to establish attribution of information to authors throughout its history. The PROV set of specifications, produced by the World Wide Web Consortium (W3C), is designed to promote the publication of provenance information on the Web, and offers a basis for interoperability across diverse provenance management systems.
What can provenance do for me? 2013-03-21 What can provenance do for me? from Stian Soiland-Reyes Also available on Slideshare, pdf and as pptx. The above presentation was originally given at the Metagenomics, metagenetics and Pylogenetic workflows for Ocean Sampling Day Workshop at Max Planck Institute for Marine Microbiology on 2013-03-21 by Stian Soiland-Reyes. Reuse allowed under the Creative Commons Attribution license 3.0.
Preprint
Applying the FAIR Principles to Computational Workflows arXiv preprint submitted to Scientific Data
Enhancing Research Data Management in Galaxy and Data Stewardship Wizard by utilising RO-Crates BioHackrXiv preprint from ELIXIR BioHackathon 2022
Implementing FAIR Digital Objects in the EOSC-Life Workflow Collaboratory Zenodo white paper
Linking provenance and its metadata in multi-organizational environments of life sciences Preprint submitted to PeerJ CS
WorkflowHub: a registry for computational workflows arXiv preprint
Presentation
What can provenance do for me? 2013-03-21 What can provenance do for me? from Stian Soiland-Reyes Also available on Slideshare, pdf and as pptx. The above presentation was originally given at the Metagenomics, metagenetics and Pylogenetic workflows for Ocean Sampling Day Workshop at Max Planck Institute for Marine Microbiology on 2013-03-21 by Stian Soiland-Reyes. Reuse allowed under the Creative Commons Attribution license 3.0.
Project
Wf4Ever project Wf4Ever was a research object funded by EU Framework 7 to investigate how scientific workflows and their data could be better preserved for reproducibility, reuse and resiliance against workflow decay.
Publication
Evaluating FAIR Digital Object and Linked Data as distributed object systems Journal article published in PeerJ Computer Science
Packaging research artefacts with RO-Crate Journal article published in Data Science
Making Canonical Workflow Building Blocks interoperable across workflow languages Journal article published in Data Intelligence
The Specimen Data Refinery: A canonical workflow framework and FAIR Digital Object approach to speeding up digital mobilisation of natural history collections Journal article published in Data Intelligence
Recording provenance of workflow runs with RO-Crate Journal article published in PLOS One
Tracking workflow execution with TavernaProv

Apache Taverna is a scientific workflow system for combining web services and local tools. Taverna records provenance of workflow runs, intermediate values and user interactions, both as an aid for debugging while designing the workflow, but also as a record for later reproducibility and comparison.

Taverna also records provenance of the evolution of the workflow definition (including a chain of wasDerivedFrom relations), attributions and annotations; for brevity we here focus on how Taverna’s workflow run provenance extends PROV and is embedded with Research Objects.

BioHackEU23 report: Enabling FAIR Digital Objects with RO-Crate, Signposting and Bioschemas BioHackrXiv preprint from ELIXIR BioHackathon 2023
Datasets A brief list of datasets and research objects (co)-created by Stian Soiland-Reyes (work in progress)
FAIR Computational workflows Journal article published in Data Intelligence
Federated causal inference based on real-world observational data sources: application to a SARS-CoV-2 vaccine effectiveness assessment Journal article in BMC Medical Research Methodology
Methods Included: Standardizing Computational Reuse and Portability with the Common Workflow Language Journal article published in Communications of the ACM
Other documents Reports, specifications and deliverables (co)authored by Stian Soiland-Reyes (work in progress)
Posters Posters presented by Stian Soiland-Reyes at conferences (work in progress)
Presentations Presentations and talks presented by Stian Soiland-Reyes at conferences and meetings (work in progress)
Publications by Stian Soiland-Reyes Academic publications by Stian Soiland-Reyes, including journal articles, conference papers/abstracts and theses.
Scientific workflows, community roadmap, data management, AI workflows, exascale computing, interoperability Preprint for proceedings article for 2021 IEEE Workshop on Workflows in Support of Large-Scale Science
Semantic micro-contributions with decentralized nanopublication services Journal article published in PeerJ Computer Science
Sharing interoperable workflow provenance: A review of best practices and their practical application in CWLProv Journal article published in GigaScience
Software Software (co)developed or maintained by Stian Soiland-Reyes (work in progress)
Ten Simple Rules for making a software tool workflow-ready Journal article published in PLOS Computational Biology
Report
Report on FAIR Signposting and its Uptake by the Community EOSC Task Force report