What Is Open Reproducible Science?
- Define open reproducible science and explain its importance.
- Describe how reproducibility can benefit yourself and others.
- List tools that can help you implement open reproducible science workflows.
What is Open Reproducible Science
Open science involves making scientific methods, data, and outcomes available to everyone. It can be broken down into several parts (Gezelter 2009) including:
- Transparency in data collection, processing and analysis methods, and derivation of outcomes.
- Publicly available data and associated processing methods.
- Transparent communication of results.
Open science is also often supported by collaboration.
Reproducible science is when anyone (including others and your future self) can understand and replicate the steps of an analysis, applied to the same or even new data.
Together, open reproducible science results from open science workflows that allow you to easily share work and collaborate with others as well as openly publish your data and workflows to contribute to greater science knowledge.
![This figure shows an open science workflow, highlighting the roles of data, code, and workflows. Source: Max Joseph, Earth Lab at University of Colorado, Boulder.](../../../images/courses/earth-analytics/bootcamp/open-science/workflow.png)
Watch this 15 minute video to learn more about the importance of reproducibility in science and the current reproducibility “crisis.”
Benefits of Open Reproducible Science
Benefits of openness and reproducibility in science include:
- Transparency in the scientific process, as anyone including the general public can access the data, methods, and results.
- Ease of replication and extension of your work by others, which further supports peer review and collaborative learning in the scientific community.
- It supports you! You can easily understand and re-run your own analyses as often as needed and after time has passed.
How Do You Make Your Work More Open and Reproducible?
The list below are things that you can begin to do to make your work more open and reproducible. It can be overwhelming to think about doing everything at once. However, each item is something that you could work towards.
Use Scientific Programming to Process Data
Scientific programming allows you to automate tasks, which facilitates your workflows to be quickly run and replicated. In contrast, graphical user interface (GUI) based workflows require interactive manual steps for processing, which become more difficult and time consuming to reproduce. If you use an open source programming language like Python
or R
, then anyone has access to your methods. However, if you use a tool that requires a license, then people without the resources to purchase that tool are excluded from fully reproducing your workflow.
Use Expressive Names for Files and Directories to Organize Your Work
Expressive file and directory names allow you to quickly find what you need and also support reproducibility by facilitating others’ understanding of your files and workflows (e.g. names can tell others what the file or directory contains and its purpose). Be sure to organize related files into directories (i.e. folders) that can help you easily categorize and find what you need (e.g. raw-data, scripts, results).
Use FAIR Data to Enhance the Reproducibility of Projects
Make sure that the data used in your project adhere to the FAIR principles (Wilkinson et al. 2016), so that they are findable, accessible, interoperable, and re-usable, and there is documentation on how to access them and what they contain. FAIR principles also extend beyond the raw data to apply to the tools and workflows that are used to process and create new data. FAIR principles enhance the reproducibility of projects by supporting the reuse and expansion of your data and workflows, which contributes to greater discovery within the scientific community.
Protect Your Raw Data
Don’t modify (or overwrite) the raw data. Keep data outputs separate from inputs, so that you can easily re-run your workflow as needed. This is easily done if you organize your data into directories that separate the raw data from your results, etc.
Document Your Workflows
Documentation can mean many different things. It can be as basic as including (carefully crafted and to the point) comments throughout your code to explain the specific steps of your workflow. Documentation can also mean using tools such as Jupyter Notebooks or RMarkdown files to include a text narrative in Markdown format that is interspersed with code to provide high level explanation of a workflow.
Documentation can also include docstrings, which provide standardized documentation of Python functions, or even README files that describe the bigger picture of your workflow, directory structure, data, processing, and outputs.
Design Workflows That Can Be Easily Recreated
You can design workflows that can be easily recreated and reproduced by others by: * listing all packages and dependencies required to run a workflow at the top of the code file (e.g. Jupyter Notebook or R Markdown files). * organizing your code into sections, or code blocks, of related code and include comments to explain the code. * creating reusuable environments for Python workflows using tools like docker containers, conda environments, and interactive notebooks with binder.