Simone Leo, Michael R. Crusoe, Laura Rodríguez-Navas, Raül Sirvent, Alexander Kanitz, Paul De Geest, Rudolf Wittner, Luca Pireddu, Daniel Garijo, José M. Fernández, Iacopo Colonnelli, Matej Gallo, Tazro Ohta, Hirotaka Suetake, Salvador Capella-Gutierrez, Renske de Wit, Bruno de Paula Kinoshita, Stian Soiland-Reyes (2023):
Recording provenance of workflow runs with RO-Crate.
- License: Creative Commons Attribution License (CC BY 4.0).
- Modifications: Formatting as Markdown, abstract only
Recording provenance of workflow runs with RO-Crate
Simone Leo1, Michael R. Crusoe2,3,4, Laura Rodríguez-Navas5, Raül Sirvent5, Alexander Kanitz6,7, Paul De Geest8, Rudolf Wittner9,10,11, Luca Pireddu1, Daniel Garijo12, José M. Fernández5, Iacopo Colonnelli13, Matej Gallo9, Tazro Ohta14,15, Hirotaka Suetake16, Salvador Capella-Gutierrez5, Renske de Wit2, Bruno de Paula Kinoshita5, Stian Soiland-Reyes17,18
1 Center for Advanced Studies, Research, and Development in Sardinia (CRS4), Loc. Piscina Manna, Edificio 1, 09050 Pula (CA), Italy
2 Vrije Universiteit Amsterdam, The Netherlands
3 DTL Projects, The Netherlands
4 Forschungszentrum Jülich, Germany
5 Barcelona Supercomputing Center (Spain)
6 Biozentrum, University of Basel, Switzerland
7 Swiss Institute of Bioinformatics, Lausanne, Switzerland
8 VIB-UGent Center for Plant Systems Biology, Gent, Belgium
9 Faculty of Informatics, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
10 Institute of Computer Science, Masaryk University, Šumavská 416/15, 602 00, Brno, Czech Republic
11 BBMRI-ERIC, Neue Stiftingtalstrasse 2, 8010, Graz, Austria
12 Ontology Engineering Group, Universidad Politécnica de Madrid
13 Università degli Studi di Torino, Computer Science Dept. Corso Svizzera 185, 10149, Torino, Italy
14 Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Shizuoka, Japan
15 Institute for Advanced Academic Research, Chiba University, Chiba, Japan"
16 Department of Creative Informatics, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
17 Department of Computer Science, The University of Manchester, Manchester, United Kingdom
18 Informatics Institute, Faculty of Science, University of Amsterdam, Amsterdam, The Netherlands
Recording the provenance of scientific computation results is fundamental to support traceability, reproducibility and quality assessment of data products. Several models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing. However, they lack interoperability, flexibility and support from workflow management systems. In this work we present Workflow Run RO-Crate, an extension of RO-Crate (Research Object Crate) to capture the provenance of the execution of computational workflows at different levels of granularity. We describe the model, as well as its implementations in workflow systems, and show its applicability to machine learning for digital pathology use cases. The model is developed by a diverse, open community that runs regular meetings, discussing requirements and practical aspects. The format is already in use in several workflow managers, including Galaxy, allowing interoperable comparisons between runs from heterogeneous systems.
(Work in progress)
This manuscript is under development by the Workflow Run Crate working group and will be published as a preprint once submitted.