Foundations

Key concepts for data management and reproducible science

Published

March 31, 2026

Open data concepts

We recommend handling research data through its life cycle as if it would become of public domain. Even if, for legal, ethical or other reasons the data cannot be made accessible to everyone without restrictions, the metadata describing the data can, almost always, be shared openly. This means managing research data with a focus on its external reusability, that is, with enough documentation to enable reuse by external collaborators outside one’s own research group. To achieve this, it is important to first understand what is meant by research data, the differences between research data management and open research data, and the different stages of the data life cycle.

What is considered research data?

According to the Organisation for Economic Co-operation and Development (OECD) (source from 2021) research data from public funding includes:

(…) actual records (such as numerical scores, textual records, images, and sounds) resulting from research that is partially or fully funded by public funds, used as primary sources for scientific research, and that are commonly accepted in the scientific community as necessary to validate research findings. This term does not cover laboratory notebooks, preliminary analyses, or drafts of scientific papers, plans for future research, peer reviews, personal communications with colleagues, or physical objects, (e.g., laboratory samples, strains of bacteria, or test animals).

Research Data Management (RDM)

RDM can be defined as

(…) the handling of research data (collection, organisation, storage, and documentation) during and after a research activity. (https://www.scienceeurope.org/our-priorities/open-science/research-data-management).

It is often conceptualized with reference to the research data life cycle illustrated in Figure 1. RDM is therefore a generic term that doesn’t necessarily imply open access to research data or alignment with best practices. For instance, RDM could be interpreted to refer to managing data internally without the curation efforts to make it reusable when shared.

Figure 1: Research data life cycle according to RDMkit by ELIXIR, 2021-2024

Open Research Data (ORD)

ORD can be defined as

Data that can be freely accessed, reused, remixed and redistributed, for academic research and teaching purposes and beyond. Ideally, open data have no restrictions on reuse or redistribution, and are appropriately licensed as such (Open Science Training Handbook).

This definition focuses on open licensing and the absence of technical access barriers. In praxis, however, “openness” also implies a broader sense of accessibility, including things such as comprehensive documentation and metadata that are an essential prerequisite of any public sharing of research data.

Producing ORD is also required in SNSF funded projects (see SNSF policy on ORD). This policy follows the statement “research data are the evidence that underpins the answer to the research question, and can be used to validate findings regardless of its form (e.g. print, digital, or physical).” from the Concordat on Open Research Data. In addition, since 2017 the SNSF requires to have a Data Management Plan (DMP) outlining the data life cycle, including where and how they will be shared and preserved (cf. research policies). Other funders such as the European Union (Horizon Europe) make similar requirements (cf.open aire requirements).  

The recent project recORD - recognise Open Research Data practices, with a focus on research assessment, acknowledged that ORD practices should take into account that “open”, and even “data”, are terms that can have different meanings across disciplines and epistemic cultures (Araujo, Bornatici, and Heers 2024). In the recORD project, practices are defined as integral components of the broader concept of open science. These practices are defined as

(…)practices aimed at facilitating access to and reuse of research data by any interested party, contingent upon specific agreements based on the type of data (Fecher and Friesike 2014).

Reproducibility

One goal of good research data management is to improve the reproducibility of scientific findings, which is crucial for the trustworthiness of empirical research. This code of conduct discusses important steps towards improving reproducibility, such as adopting standards for organising data, providing documentation and metadata, and using interoperable file formats, controlled vocabularies, and version control. Throughout, the code will refer to different aspects of reproducibility, which we introduce in this chapter. Irrespective of exact definitions, we recommend researchers consider scientific reproducibility to be a continuum that benefits from any action and effort that improves transparency and reusability of their research outputs.

“Science is widely regarded as the most reliable way of turning observations into facts. But none of our scientific facts should be based on a single study, or one set of results. Whenever a study claims to have made a new discovery, it is the job of science to ask how we can verify and confirm these results.” (Held and Schwab 2020).

Terminology

(2019) Definitions by the US National Academy of Sciences

The National Academy of Sciences discussed the following concepts in their 2019 book on scientific reproducibility and replicability (National Academies of Sciences, Engineering and Medicine 2019):

  • Reproducibility or computational reproducibility: obtaining consistent computational results using the same input data, computational steps, methods, code, and conditions of analysis;

  • Replicability: obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data.

  • Generalizability: refers to the extent that results of a study apply in other contexts or populations that differ from the original one.

A single scientific study may entail one or more of these concepts.

(2022) Definitions by the Turing Way

The Turing way book (The Turing Way Community 2022) added the concept of ‘robustness’ and systematized the different aspects of reproducibility in the matrix shown in Figure 2.

Figure 2: Definitions of reproducibility from The Turing Way

(2023) Definitions by the iRISE project

The EU-funded project iRISE - improving Reproducibility In SciencE - Open Knowledge Base, in its glossary, highlights another aspect of reproducibility, which links the concept to open science practices: ‘the extent to which design, implementation, analysis, and reporting of a study enable a third party to repeat the study and assess its findings’. We recommend the iRISE glossary as a resource for those interested in recent in-depth discussion of the terminology around reproducibility. In this code of conduct we will use the earlier terminology from the Turing Way as it is standard and established in various areas of empirical research.

The FAIR principles

The FAIR principles of Findability, Accessibility, Interoperability and Reusability (Wilkinson et al. 2016), are the most widely adopted framework for guiding the management of digital resources, in particular, scientific data and metadata. They define guiding principles and practices intended to enable both machines and humans to reuse research data. These core precepts have led to a myriad of guidelines and initiatives to define, evaluate and implement them in practice. We endorse the flexible FAIRification framework (Welter et al. 2023) that emphasizes three key practical aspects: setting realistic, incremental goals; tailoring support to the individual needs; and working in multidisciplinary teams, involving all scientific and infrastructure stakeholders relevant to the individual project.

We recommend to follow a flexible approach aiming for a balanced “FAIR enough” status of the data, depending on the available resources and required capabilities.

Getting started

Each FAIR principle has subprinciples with definitions that may slightly vary depending on the field. FAIRification is a process with the ultimate goal of making our data become machine actionable digital objects. That is not always achievable and requires many resources. However, at a minimum, researchers can aim at producing metadata that is human and machine-readable to a degree sufficient to enable experts to reuse their data without requiring unreasonable efforts to understand it. FAIR does not necessarily mean the data will be open and publicly available. Thus, as a starting point, we consider that a research must:

  1. Provide sufficient machine-readable metadata or context to their data

  2. Share or publish the data in repositories that are FAIR-compatible to a reasonable degree (e.g., provide persistent identifiers and require metadata).

Summary of FAIR principles

FAIR Principle Description
Findable It should be possible for others to discover your data. Rich metadata should be available online in a searchable resource, and the data should be assigned a persistent identifier.
F1 (Meta)data are assigned globally unique and persistent identifiers.
F2 Data are described with rich metadata.
F3 Metadata clearly and explicitly include the identifier of the data they describe.
F4 (Meta)data are registered or indexed in a searchable resource.
Accessible It should be possible for humans and machines to gain access to your data, under specific conditions or restrictions where appropriate. FAIR does not mean that data need to be open! There should be metadata, even if the data are not accessible.
A1 (Meta)data are retrievable by their identifier using a standardised communication protocol.
A1.1 The protocol is open, free and universally implementable.
A1.2 The protocol allows for an authentication and authorisation procedure where necessary.
A2 Metadata should be accessible even when the data is no longer available.
Interoperable Data and metadata should conform to recognized formats and standards to allow them to be combined and exchanged.
I1 (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2 (Meta)data use vocabularies that follow the FAIR principles.
I3 (Meta)data include qualified references to other (meta)data.
Reusable Lots of documentation is needed to support data interpretation and reuse. The data should conform to community norms and be clearly licensed so others know what kinds of reuse are permitted.
R1 (Meta)data are richly described with a plurality of accurate and relevant attributes.
R1.1 (Meta)data are released with a clear and accessible data usage license.
R1.2 (Meta)data are associated with detailed provenance.
R1.3 (Meta)data meet domain-relevant community standards.

Selected resources

  • FAIR data page from the library of the University of Zurich. It includes tutorial and checklists.
  • FAIR checker is a webapp to help you improve your metadata and inspect metadata of published data. Just add the URL of a dataset. It will give a ‘FAIRness’ score and describe how much each FAIR principle is fulfilled.

Preregistration

A preregistration specifies the research questions and the analysis plan before conducting the study. As such it allows to clearly distinguish the testing of a hypothesis with new data from the generation of hypotheses with existing observations, making research more robust, replicable and transparent (Nosek et al. 2018).

Preregistering a study does not imply that you cannot conduct any additional exploratory analyses or change plans after data are collected. It will merely clarify the differences between analyses decided a priori and those decided after looking at the data. If a publication derives from a preregistered study, which is common, it is important to disclose and justify the changes transparently in the manuscript.

Preregistration has some additional benefits:

  • Preregistration can prevent unnecessary repetitio studies since it promotes systematic search of preceding studies and helps formulating relevant research questions.
  • Much of the information included in a preregistration is often required for ethical approval, so the effort investing in elaborating a preregistration can be relatively low. In fact, having a well-defined analysis plan can facilitate the analytic decision process and save time in later stages of a research project.

We recommend adopting one of the preregistration forms listed below. The simplest option, such as a minimal template for a preregistered research plan, is a good starting point.

Forms of preregistration

  • Preregistration. Consists in registering a research plan (hypotheses, methods, analyses) before an study is conducted. A preregistration does not necessarily have to be public, as long as it is registered and time-stamped on a trusted platform. Read more on https://www.cos.io/initiatives/prereg.

  • Registered reports. Involve a two-stage peer review process where a study proposal is reviewed before data collection and the final manuscript is reviewed after analysis. At stage 1, researchers can obtain in-principle acceptance, so even if no statistically significant results are found in the planned analyses, the paper would be accepted for publication. Read more on https://www.cos.io/initiatives/registered-reports and in  Hardwicke and Ioannidis (2018).

  • Peer community in registered reports (PCI-RR). The PCI initiative is a researcher-run initiative that publishes peer-reviews of preprints and the PCI-RR community is dedicated to receiving, reviewing and recommending registered reports across disciplines. Read more on this topic and find a list of PCI-RR friendly journals on https://rr.peercommunityin.org/about/

Selected resources

  • Open Science Framework is a platform where you can upload and create project registration and preregistrations. It has a variety of templates in multiple formats.

  • As predicted. A platform to facilitate preregistration, with a focus on using very simple and easy to fill in questionnaire. Answering the simple questions (see this sample) of the questionnaire will generate a time-stamped single page .pdf document with a unique URL for verification. You cannot import templates from other preregistration platforms into AsPredicted but you can use the document from AsPredicted in other platforms (e.g., OSF).

  • Preclinical EU. A platform aimed at providing a comprehensive data base of preclinical animal study protocols. Studies can be registerred anonymously, free of charge and with an optional embargo period. Its adoption after 3 years was evaluated in a report Van Der Naald et al. (2022).

Data Management Plans (DMPs)

We recommend using a data management plan (DMP) as a living document to guide research data management in a project. Besides fulfilling the requirements from the funding agencies for data sharing and publishing, DMPs can be very useful tools to improve efficiency of our research workflow.

SNSF requirements

The SNSF requires a DMP for approved applications (see SNSF guidelines ). A DMP should give information on the data life cycle. The minimum structure and content that a DMP should provide are shown in the SNSF’s DMP form. The form includes four sections:

  1. Data collection and documentation

  2. Ethics, legal and security issues

  3. Data storage and preservation

  4. Data sharing and reuse.

The DMP remains editable during the entire lifetime of the grant. Its contents can be adapted as the project evolves. Thus, it is recommended to tracking changes and keep control of the versions of this document.

The DMP content can be relatively broad and they are very individual. DMPs should be plausible, suit their project and meet standards by the research community.

General recommendations for DMPs

Our recommendations are the following:

  • Use SNSF form-based templates when available.

  • Include enough details so that this document can accompany the project and be useful to conduct the data-related actions at different stages of the data life cycle. A DMP can be a very useful tool for project management.

  • Update it when changes or deviations from the original plans are needed.

  • Use version control and track changes.

  • Share it with collaborators. Within a project, it can be used as an onboarding document and as a sort of project blueprint. When shared outside a project, it can help communities to adopt similar standards and facilitate collaboration and data reuse.

  • Consult a data steward for inputs when writing a DMP. A data steward or someone in a similar role can help finding DMPs from similar fields, choosing community standards, assessing project’s needs regarding data management and reviewing the DMP’s content so that it is sufficiently detailed

Selected resources

Type Description
Examples DMPs at LiRI, (Linguistic Research Infrastructure, UZH)
Examples DMP examples from UK’s Digital Curation Center
Examples DMP examples by the University of Bern
Generator DMP generator by Swiss Institute of Bioinformatics. It has predefined boxes that will generate a doc with predefined paragraphs. Requires Switch Edu-ID login. Oriented for life sciences
Form SNSF form for DMPs (link to pdf)
Form DMP Data Life-Cycle Management (DLCM) project: form based on SNSF’s template
Checklist Data Life-Cycle Management (DLCM) project: DMP checklist
Templates École Polytechnique Fédérale de Lausanne (EPFL) DMP template - link to pdf

The 3R principles

The principles of the 3Rs concerning animal experimentation were postulated in 1959 by William Russell and Rex Burch in a book entitled Principles of Humane Experimental Technique. The principles are today internationally widely recognized by scientists as a moral obligation. They are moreover implemented in many national legislations on animal protection.

We recommend using these principles and associated resources as guidance to improve research data management. Generating reusable data will help reduce research waste and maximize the utility of animal experiments.

Table 1: The 3R principles for more humane animal research
Principle Description Details
Replacement Methods that achieve a purpose without conducting experiments or procedures on animals. Approaches that do not use living animals, such as cultured cells, tissues, organs, and in silico computational methods.Use of animals that are not considered capable of experiencing suffering, such as some invertebrates and immature vertebrates.
Reduction Methods that obtain comparable information using fewer animals or maximize data collection from the same number of animals. Reduction strategies include: Optimization of breeding programs, experimental design, and statistical analysis. Sharing animal materials (organs, tissues, cells). Using longitudinal instead of cross-sectional measurements. Reducing unexplained variation in data
Refinement Methods that minimize pain, suffering, and distress while enhancing animal well-being. Improving housing, handling, anesthesia, and analgesia.Habituation to procedures and execution of humane endpoints. Developing better tools to assess suffering and well-being.

The Swiss 3R Competence Centre

The Swiss 3R Competence Centre swiss3RCC is a national center that promotes research, education and communication for the replacement, reduction and refinement of animal experimentation (3Rs principle). More information on their extended network including observer members, 3Rs centers and international partners can be found in their website: https://swiss3rcc.org/network-and-partners.

Back to top

References

Araujo, Pedro, Christina Bornatici, and Marieke Heers. 2024. “Recognising Open Research Data in Research Assessment: Overview of Practices and Challenges,” May. https://doi.org/10.5281/ZENODO.11060207.
Fecher, Benedikt, and Sascha Friesike. 2014. “Open Science: One Term, Five Schools of Thought.” In, edited by Sönke Bartling and Sascha Friesike, 17–47. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-00026-8_2.
Hardwicke, Tom E., and John P. A. Ioannidis. 2018. “Mapping the Universe of Registered Reports.” Nature Human Behaviour 2 (11): 793–96. https://doi.org/10.1038/s41562-018-0444-y.
Held, Leonhard, and Simon Schwab. 2020. “Improving The Reproducibility of Science.” Significance 17 (1): 10–11. https://doi.org/10.1111/j.1740-9713.2020.01351.x.
National Academies of Sciences, Engineering and Medicine. 2019. Reproducibility and Replicability in Science. Washington, D.C.: National Academies Press. https://doi.org/10.17226/25303.
Nosek, Brian A., Charles R. Ebersole, Alexander C. DeHaven, and David T. Mellor. 2018. “The Preregistration Revolution.” Proceedings of the National Academy of Sciences 115 (11): 2600–2606. https://doi.org/10.1073/pnas.1708274114.
The Turing Way Community. 2022. The Turing Way: A Handbook for Reproducible, Ethical and Collaborative Research (1.0.2). Zenodo. https://doi.org/10.5281/ZENODO.3233853.
Van Der Naald, Mira, Steven A J Chamuleau, Julia M L Menon, Wim De Leeuw, Judith De Haan, Dirk J Duncker, and Kimberley Elaine Wever. 2022. “Preregistration of Animal Research Protocols: Development and 3-Year Overview of Preclinicaltrials.eu.” BMJ Open Science 6 (1). https://doi.org/10.1136/bmjos-2021-100259.
Welter, Danielle, Nick Juty, Philippe Rocca-Serra, Fuqi Xu, David Henderson, Wei Gu, Jolanda Strubel, et al. 2023. “FAIR in Action - a Flexible Framework to Guide FAIRification.” Scientific Data 10 (1): 291. https://doi.org/10.1038/s41597-023-02167-2.
Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016. “The FAIR Guiding Principles for Scientific Data Management and Stewardship.” Scientific Data 3 (1): 160018. https://doi.org/10.1038/sdata.2016.18.