PhD

Beside working at The University of Manchester, I am a PhD Candidate in the INDElab at the Informatics Institute of the University of Amsterdam.

My supervisors are Paul Groth and Carole Goble. My thesis title is FAIR Research Objects and Computational Workflows – A Linked Data Approach.

I started in July 2019 and concluded in December 2023, with final manuscript submitted in May 2024. The formal doctoral thesis defence is scheduled for 15th January 2025.

As I did a thesis by publication, these pages gather the articles submitted/accepted/published that became part of my PhD thesis. Note that some of these articles have been slightly reformatted to fit publication on the Web or in the thesis PDF (e.g. using the s11 Citation house style), see full list of sources and editorial changes in appendix Contributions.

Book cover: FAIR Research Objects and Computational Workflows

FAIR Research Objects and Computational Workflows – A Linked Data Approach

This PhD thesis explores the topics of RO-Crate, FAIR Digital Objects (FDOs), and computational workflows, in order to examine how these can be implemented and integrated using Linked Data approaches – forming “FAIR Research Objects”.

The background covers the evolution of the Semantic Web, Linked Data, and FAIR Digital Objects, which are then evaluated against the FAIR principles (Findable, Accessible, Interoperable, Reusable) and several frameworks, to consider these technologies as potential middleware for a global distributed object system. The positive outcome shows that it is possible to achieve the ultimate goal of machine-actionable research outputs.

This work introduces the broader community-developed method RO-Crate for packaging research artefacts with their contextual information, relationships and metadata – using Linked Data standards that have been simplified and documented in detail for easier adaptation by software developers. The tension between freedom for implementations and rigidity of semantic constraints is explored, and demonstrated by various profiles of RO-Crate that have been implemented across research domains such as bioinformatics, regulatory sciences, biodiversity and digital humanities.

Computational workflows, commonly used by scientists for reproducible data analysis across execution platforms, are then examined as potential FAIR Digital Objects. Workflows are considered as shareable research outputs (by capturing the computational method for later reuse) and as part of provenance of computational results, captured in a profile of RO-Crate. Additionally the concept of Canonical Workflow Building Blocks is introduced as a method for FAIR sharing of tools across different workflow systems. A case study from natural history museums and biodiversity shows how the combination of workflows and RO-Crate can be used to annotate digitised specimens step by step, and gradually build reproducible domain-specific FDOs.

The discussion part of this thesis explores how the emerging ecosystem of FAIR Digital Objects can build on the results from the collaborative development of RO-Crate to carefully adapt “just enough” of Linked Data technologies with a balance of flexibility and predictability. Future directions for RO-Crate are examined, including new adaptations and further alignments with FAIR and FDO principles. Lessons from computational workflows further inform directions of FDO and RO-Crate.

The main findings of this thesis conclude that Web approaches can achieve the goals of FDO, by using existing standards with sufficient constraints that gives developers predictability and necessary flexibility. The lightweight Linked Data recommendations of RO-Crate are shown to be implementable for a range of applications, supporting advancement of the FAIR principles through practical and interoperable use of Web standards.