ISO 23494: Biotechnology – Provenance Information Model for Biological Specimen and Data

Cite as

Rudolf Wittner, Petr Holub, Heimo Müller, Joerg Geiger, Carole Goble, Stian Soiland-Reyes, Luca Pireddu, Francesca Frexia, Cecilia Mascia, Elliot Fairweather, Jason R. Swedlow, Josh Moore, Caterina Strambio, David Grunwald, Hiroki Nakae (2021):
ISO 23494: Biotechnology – Provenance Information Model for Biological Specimen and Data.
In: Glavic B., Braganholo V., Koop D. (eds) Provenance and Annotation of Data and Processes (IPAW 2020/2021).
Lecture Notes in Computer Science 12839.
https://doi.org/10.1007/978-3-030-80960-7_16

ISO 23494: Biotechnology – Provenance Information Model for Biological Specimen and Data

Rudolf Wittner1,2, Petr Holub1,2, Heimo Müller3, Joerg Geiger4, Carole Goble5, Stian Soiland-Reyes5,11, Luca Pireddu6, Francesca Frexia6, Cecilia Mascia6, Elliot Fairweather7, Jason R. Swedlow8, Josh Moore8, Caterina Strambio9, David Grunwald9, Hiroki Nakae10

1 BBMRI-ERIC, AUT
2 Institute of Computer Science & Faculty of Informatics, Masaryk University, CZ
3 Medical University Graz, AUT
4 Interdisciplinary Bank of Biomaterials and Data Würzburg (ibdw), Würzburg, DE
5 Department of Computer Science, The University of Manchester, UK
6 CRS4 – Center for Advanced Studies, Research and Development in Sardinia, IT
7 King’s College London, UK
8 School of Life Sciences, University of Dundee, Dundee, UK
9 University of Massachusetts, US
10 Japan bio- Measurement and Analysis Consortium, JPN
11 Informatics Institute, University of Amsterdam, NL

Introduction

Research in life sciences has undergone significant changes during recent years, evolving away from individual projects confined to small research groups to transnational consortia covering a wide range of techniques and expertise. At the same time, several reports addressing the quality of research papers in life sciences have uncovered an alarming number of ill-founded claims.

The reasons for the deficiencies are diverse, with insufficient quality and documentation of the biological material used being the major issue [1] [2] [3]. Hence there is urgent need for standardized and comprehensive documentation of the whole workflow from the collection, generation, processing and analysis of the biological material to data analysis and integration.

The PROV [4] family of documents serves as a current standard for provenance information used to describe the history of an object. On the other hand, as discussed in the results from EHR4CR and TRANSFoRm projects [5] [6], its implementation for the biotechnology domain and the field of biomedical research in particular is still a pending issue.

To address this, the International Standardisation Organisation (ISO) initiated the development of a Provenance Information Model for Biological Specimen and Data standard defining the requirements for interoperable, machine-actionable documentation intended to describe the complete process chain from the source of biological material through its processing, analysis, and all steps of data generation and data processing to final data analysis.

The standard is intended for implementers and suppliers of HW/SW tools used in biomedical research (e.g. lab automation devices or analytical devices used for research purposes) and also for organisations adopting generated provenance (e.g. to require or use standardised tools).

Goals of the Standard and Its Structure

The main goals of the standard are to

The proposed structure of the standard reflects the intention to interconnect and integrate distributed provenance information furnished by all kinds of organisations involved in biotechnology research. Examples of such organisations are hospitals, biobanks, research centers, universities, data centers or pharmaceutical companies, where each of them is participating in research, thus generating provenance information describing particular activities or contributions. In its current the standard is composed of the following 6 parts:

Part 1
stipulates common requirements for provenance information management in biotechnology to effectuate compatibility of provenance management at all stages of research and defines the design concept of this standard
Part 2
defines a common provenance model which will serve as an overarching principle interconnecting provenance parts generated by all kinds of contributing organisations and enable access to provenance information in a distributed environment
Part 3, 4 and 5
are meant to complement the horizontal standards (1) and (2) as vertical standards defining domain specific provenance models describing diverse stages or areas of research in biotechnology (e.g. sample acquisition and handling, analytical techniques, data management, cleansing and processing; database validation)
Part 6
will contain optional data security extensions especially to address non-repudiation of provenance

The proposed structure is also depicted in Figure 1. Parts indicated by red boxes are considered as horizontal standards, i.e. providing a common basis for provenance information at all stages of research. The blue boxes indicate domain specific vertical standards build on top of the horizontal standards.

Overall structure of the standard

Current Status and Future Development

The standard is currently at an early stage of development. The PROV model has been already used to define new types of provenance structures, called connectors, that are used to interconnect provenance generated by different organizations. The concept of the connectors and a common mechanism for bundles versioning has been published as an EOSC-Life project provenance deliverable [7]. A publication describing use of the connectors at a specific use case is under development at the moment and its pre-print will be published in summer 20211.

Continuously, the model will be enriched by new types of structures (e.g. relations, entities, etc.) to capture common objects. These structures will be subsequently used to design provenance templates 2 to define a common representation of usual scenarios in life sciences.

Further aspects will be also targeted. The major focus areas are: opaque provenance components; privacy preservation and non-repudiation of provenance information; full syntactic and semantic interoperability of provenance information captured; rigorous formal verification process of provenance instance validity (provable compliance with the proposed model).

Another publication describing the standardization process in a more detailed way is under development. The publication will contain more detailed explanation of our motivation and the standardization activity itself, more detailed description of the standard structure, and finally, an important discussion on openness of the standard and related issues.

Presented at Provenance Week 2021

ISO 23494 Poster

Presented at Provenance Week 2021

Acknowledgements

Supported by European Union’s Horizon 2020 research and innovation programme under grant agreement No. 654248, project CORBEL; grant agreement No. 824087, project EOSC-Life; and grant agreement No. 823830, project BioExcel-2.

References

[1] Leonard P. Freedman, Iain M. Cockburn, Timothy S. Simcoe (2015):
The economics of reproducibility in preclinical research.
PLOS Biology 13(6):e1002165
https://doi.org/10.1371/journal.pbio.1002165

[2] C. Glenn Begley, John P. A. Ioannidis (2015):
Reproducibility in science: Improving the Standard for Basic and Preclinical Research.
Circulation Research 116
https://doi.org/10.1161/CIRCRESAHA.114.303819

[3] Leonard P. Freedman, James Inglese (2014):
The increasing urgency for standards in basic biologic research.
Cancer Research 74(15)
https://doi.org/10.1158/0008-5472.CAN-14-0925

[4] Paul Groth, Luc Moreau (2013):
PROV-overview. An overview of the PROV family of documents.
W3C Working Group Note 2013-04-30
https://www.w3.org/TR/2013/NOTE-prov-overview-20130430/

[5] Vasa Curcin, Simon Miles, R. Danger, Y. Chen, Richard Bache, Adel Taweel (2014):
Implementing interoperable provenance in biomedical research.
Future Generation Computer Systems 34
https://doi.org/10.1016/j.future.2013.12.001

[6] Gianmauro Cuccuru, Simone Leo, Luca Lianas, Michele Muggiri, Andrea Pinna, Luca Pireddu, Paolo Uva, Alessio Angius, Giorgio Fotia, Gianluigi Zanetti (2014):
An automated infrastructure to support high-throughput bioinformatics.
2014 international conference on High performance computing & simulation (HPCS)
https://doi.org/10.1109/HPCSim.2014.6903742

[7] Rudolf Wittner, Cecilia Mascia, Francesca Frexia, Heimo Müller, Jörg Geiger, Katrina Exter, Petr Holub (2021):
EOSC-life common provenance model.
Zenodo
https://doi.org/10.5281/zenodo.4705074


  1. See Toward a common standard for data and specimen provenance in life sciences and Linking provenance and its metadata in multi-organizational environments of life sciences; s11 ed. 2023-09-19 ↩︎

  2. The templates can be considered as synonyms for named graphs or graph patterns. These concepts are used to abstract from actual instances of provenance and to describe repeating occurrences of components of provenance ↩︎