PhD
Beside working at The University of Manchester I am a PhD Candidate in the INDElab at the Informatics Institute of the University of Amsterdam.
My supervisor is Paul Groth and my thesis title is FAIR Research Objects and Computational Workflows – A Linked Data Approach.
I started in July 2019 and concluded in December 2023, with final manuscript submitted in May 2024. The formal doctoral thesis defence is scheduled for January 2025.
As I did a thesis by publication, these pages gather the articles submitted/accepted/published that became part of my PhD thesis. Note that some of these articles have been slightly reformatted to fit publication on the Web, see also PDF draft in Zenodo.
- Cite as: https://doi.org/10.5281/zenodo.8113625
- License: Creative Commons Attributions 4.0 International
- RO-Crate: https://w3id.org/ro/doi/10.5281/zenodo.8113625
- Status: Admission to the doctoral defence (see flowchart)
FAIR Research Objects and Computational Workflows – A Linked Data Approach
This PhD thesis explores the topics of RO-Crate, FAIR Digital Objects (FDOs), and computational workflows, in order to examine how these can be implemented and integrated using Linked Data approaches – forming “FAIR Research Objects”.
The background covers the evolution of the Semantic Web, Linked Data, and FAIR Digital Objects, which are then evaluated against the FAIR principles (Findable, Accessible, Interoperable, Reusable) and several frameworks, to consider these technologies as potential middleware for a global distributed object system. The positive outcome shows that it is possible to achieve the ultimate goal of machine-actionable research outputs.
This work introduces the broader community-developed method RO-Crate for packaging research artefacts with their contextual information, relationships and metadata – using Linked Data standards that have been simplified and documented in detail for easier adaptation by software developers. The tension between freedom for implementations and rigidity of semantic constraints is explored, and demonstrated by various profiles of RO-Crate that have been implemented across research domains such as bioinformatics, regulatory sciences, biodiversity and digital humanities.
Computational workflows, commonly used by scientists for reproducible data analysis across execution platforms, are then examined as potential FAIR Digital Objects. Workflows are considered as shareable research outputs (by capturing the computational method for later reuse) and as part of provenance of computational results, captured in a profile of RO-Crate. Additionally the concept of Canonical Workflow Building Blocks is introduced as a method for FAIR sharing of tools across different workflow systems. A case study from natural history museums and biodiversity shows how the combination of workflows and RO-Crate can be used to annotate digitised specimens step by step, and gradually build reproducible domain-specific FDOs.
The discussion part of this thesis explores how the emerging ecosystem of FAIR Digital Objects can build on the results from the collaborative development of RO-Crate to carefully adapt “just enough” of Linked Data technologies with a balance of flexibility and predictability. Future directions for RO-Crate are examined, including new adaptations and further alignments with FAIR and FDO principles. Lessons from computational workflows further inform directions of FDO and RO-Crate.
The main findings of this thesis conclude that Web approaches can achieve the goals of FDO, by using existing standards with sufficient constraints that gives developers predictability and necessary flexibility. The lightweight Linked Data recommendations of RO-Crate are shown to be implementable for a range of applications, supporting advancement of the FAIR principles through practical and interoperable use of Web standards.
-
Glossary Glossary of terms and acronyms used in my PhD thesis
-
Chapter 1: Introduction Introduction and overview of PhD thesis and its research questions
-
Chapter 2: Background In this chapter, we discuss the related work with respect to FAIR Digital Objects and Linked Data. We do so by looking through the lens of development of these technologies over time, including future directions.
-
Chapter 3: FAIR Digital Objects and Linked Data To investigate RQ1 this chapter evaluates both Linked Data and FAIR Digital Object (FDO) as ways to realize the FAIR principles.
-
Evaluating FAIR Digital Object and Linked Data as distributed object systems Journal article published in PeerJ Computer Science
-
Updating Linked Data practices for FAIR Digital Object principles Talk abstract presented at FAIR Digital Objects conference (FDO2022)
-
Chapter 4: RO-Crate This chapter introduces RO-Crate, a pragmatic method of packaging data alongside structured metadata that is inline with the FAIR principles. This has been implemented to investigate RQ2.
-
Packaging research artefacts with RO-Crate Journal article published in Data Science
-
Creating lightweight FAIR Digital Objects with RO-Crate Poster abstract presented at FAIR Digital Objects conference (FDO2022)
-
Formalizing RO-Crate in First Order Logic Appendix from Journal article published in Data Science
-
Chapter 5: Computational Workflows In order to investigate RQ3, and considering important parts of the FAIR principles include Reuse and provenance, this chapter examines in closer details how FAIR Digital Objects and RO-Crate can be used with Computational Workflows.
-
Making Canonical Workflow Building Blocks interoperable across workflow languages Journal article published in Data Intelligence
-
The Specimen Data Refinery: A canonical workflow framework and FAIR Digital Object approach to speeding up digital mobilisation of natural history collections Journal article published in Data Intelligence
-
Incrementally building FAIR Digital Objects with Specimen Data Refinery workflows Poster abstract presented at FAIR Digital Objects conference (FDO2022)
-
Recording provenance of workflow runs with RO-Crate Preprint resubmitted to PLOS One following peer review
-
Chapter 6: Discussion and conclusions Overall consideration of this thesis.
-
Discussion Discussion of findings from this thesis, relating them to emerging related work and future directions.
-
Conclusions Conclusions for the research questions raised in introduction.
-
References An aggregated list of references from the chapters of this PhD thesis.
-
Appendix A: Acknowledgements Acknowledgements of PhD
-
Appendix B: Contributions Contributions for each chapter of this thesis, and listing all the other contributors and their affiliations.
-
Appendix C: Supplements Supplementary publications from my PhD thesis
-
Ten Simple Rules for making a software tool workflow-ready Journal article published in PLOS Computational Biology
-
Enhancing RDM in Galaxy by integrating RO-Crate Poster abstract presented at FAIR Digital Objects conference (FDO2022)
-
Implementing FAIR Digital Objects in the EOSC-Life Workflow Collaboratory Zenodo white paper
-
Methods Included: Standardizing Computational Reuse and Portability with the Common Workflow Language Journal article published in Communications of the ACM
-
Semantic micro-contributions with decentralized nanopublication services Journal article published in PeerJ Computer Science
-
ISO 23494: Biotechnology – Provenance Information Model for Biological Specimen and Data Poster abstract accepted for Provenance Week 2020
-
Scientific workflows, community roadmap, data management, AI workflows, exascale computing, interoperability Preprint for proceedings article for 2021 IEEE Workshop on Workflows in Support of Large-Scale Science
-
RO-Crate, a lightweight approach to Research Object data packaging Conference abstract presented at Workshop on Research Objects 2019 (RO2019)
-
Federated causal inference based on real-world observational data sources: application to a SARS-CoV-2 vaccine effectiveness assessment Journal article in BMC Medical Research Methodology
-
Linking provenance and its metadata in multi-organizational environments of life sciences Preprint submitted to PeerJ CS
-
Enhancing Research Data Management in Galaxy and Data Stewardship Wizard by utilising RO-Crates BioHackrXiv preprint from ELIXIR BioHackathon 2022