{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# What Is Open Reproducible Science?\n",
"\n",
"Jenny Palomino \n",
"Leah Wasser \n",
"Max Joseph \n",
"Elsa Culler\n",
"\n",
"## What is Open Reproducible Science\n",
"\n",
"Open science involves making scientific methods, data, and outcomes\n",
"available to everyone. It can be broken down into several parts\n",
"(Gezelter\n",
"2009) including:\n",
"\n",
"- Transparency in data collection, processing and analysis methods,\n",
" and derivation of outcomes.\n",
"- Publicly available data and associated processing methods.\n",
"- Transparent communication of results.\n",
"\n",
"Open science is also often supported by collaboration.\n",
"\n",
"Reproducible science is when anyone (including others and your future\n",
"self) can understand and replicate the steps of an analysis, applied to\n",
"the same or even new data.\n",
"\n",
"Together, open reproducible science results from open science workflows\n",
"that allow you to easily share work and collaborate with others as well\n",
"as openly publish your data and workflows to contribute to greater\n",
"science knowledge."
],
"id": "973344e8-20d8-4c7e-bb51-54000041d633"
},
{
"cell_type": "raw",
"metadata": {
"raw_mimetype": "text/html"
},
"source": [
""
],
"id": "f7cea687-2060-4498-adea-ef7a8b8e14f4"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
""
],
"id": "89306a2e-7bbf-4945-bcf3-89aabac23a71"
},
{
"cell_type": "raw",
"metadata": {
"raw_mimetype": "text/html"
},
"source": [
""
],
"id": "afe1f871-5772-418b-bff9-1c47dacfabf3"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"An open science workflow highlighting the roles of data, code, and\n",
"workflows. Source: Max Joseph, Earth Lab at University of Colorado,\n",
"Boulder."
],
"id": "4a594052-1de4-4946-8370-00851c5f1842"
},
{
"cell_type": "raw",
"metadata": {
"raw_mimetype": "text/html"
},
"source": [
""
],
"id": "35666323-f871-40cb-b0eb-9b8130c81e05"
},
{
"cell_type": "raw",
"metadata": {
"raw_mimetype": "text/html"
},
"source": [
""
],
"id": "6b1d158f-c447-4e11-977c-0c92b23043cc"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"> Watch this 15 minute video to learn more about the importance of\n",
"> reproducibility in science and the current reproducibility “crisis.”\n",
"\n",
"## Benefits of Open Reproducible Science\n",
"\n",
"Benefits of openness and reproducibility in science include:\n",
"\n",
"- Transparency in the scientific process, as anyone including the\n",
" general public can access the data, methods, and results.\n",
"- Ease of replication and extension of your work by others, which\n",
" further supports peer review and collaborative learning in the\n",
" scientific community.\n",
"- It supports you! You can easily understand and re-run your own\n",
" analyses as often as needed and after time has passed.\n",
"\n",
"## How Do You Make Your Work More Open and Reproducible?\n",
"\n",
"The list below are things that you can begin to do to make your work\n",
"more open and reproducible. It can be overwhelming to think about doing\n",
"everything at once. However, each item is something that you could work\n",
"towards.\n",
"\n",
"### Use Scientific Programming to Process Data\n",
"\n",
"Scientific programming allows you to automate tasks, which facilitates\n",
"your workflows to be quickly run and replicated. In contrast, graphical\n",
"user interface (GUI) based workflows require interactive manual steps\n",
"for processing, which become more difficult and time consuming to\n",
"reproduce. If you use an open source programming language like `Python`\n",
"or `R`, then anyone has access to your methods. However, if you use a\n",
"tool that requires a license, then people without the resources to\n",
"purchase that tool are excluded from fully reproducing your workflow.\n",
"\n",
"### Use Expressive Names for Files and Directories to Organize Your Work\n",
"\n",
"Expressive\n",
"file and directory names allow you to quickly find what you need and\n",
"also support reproducibility by facilitating others’ understanding of\n",
"your files and workflows (e.g. names can tell others what the file or\n",
"directory contains and its purpose). Be sure to organize related files\n",
"into directories (i.e. folders) that can help you easily categorize and\n",
"find what you need (e.g. raw-data, scripts, results).\n",
"\n",
"### Use FAIR Data to Enhance the Reproducibility of Projects\n",
"\n",
"Make sure that the data used in your project adhere to the FAIR\n",
"principles\n",
"(Wilkinson\n",
"et al. 2016), so that they are findable, accessible, interoperable,\n",
"and re-usable, and there is documentation on how to access them and what\n",
"they contain. FAIR principles also extend beyond the raw data to apply\n",
"to the tools and workflows that are used to process and create new data.\n",
"FAIR principles enhance the reproducibility of projects by supporting\n",
"the reuse and expansion of your data and workflows, which contributes to\n",
"greater discovery within the scientific community.\n",
"\n",
"### Protect Your Raw Data\n",
"\n",
"Don’t modify (or overwrite) the raw data. Keep data outputs separate\n",
"from inputs, so that you can easily re-run your workflow as needed. This\n",
"is easily done if you organize your data into directories that separate\n",
"the raw data from your results, etc.\n",
"\n",
"### Use Version Control and Share Your Code (If You Can)\n",
"\n",
"Version control allows you to manage and track changes to your files\n",
"(and even undo them!). If you can openly share your code, implement\n",
"version control and then publish your code and workflows on the cloud.\n",
"There are many free tools to do this including\n",
"Git and\n",
"GitHub.\n",
"\n",
"### Document Your Workflows\n",
"\n",
"Documentation can mean many different things. It can be as basic as\n",
"including (carefully crafted and to the point) comments throughout your\n",
"code to explain the specific steps of your workflow. Documentation can\n",
"also mean using tools such as Jupyter Notebooks or RMarkdown files to\n",
"include a text narrative in Markdown format that is interspersed with\n",
"code to provide high level explanation of a workflow.\n",
"\n",
"Documentation can also include\n",
"docstrings,\n",
"which provide standardized documentation of Python functions, or even\n",
"README files that describe the bigger picture of your workflow,\n",
"directory structure, data, processing, and outputs.\n",
"\n",
"### Design Workflows That Can Be Easily Recreated\n",
"\n",
"You can design\n",
"workflows\n",
"that can be easily recreated and reproduced by others by: \\* listing\n",
"all packages and dependencies required to run a workflow at the top of\n",
"the code file (e.g. Jupyter Notebook or R Markdown files). \\* organizing\n",
"your code into sections, or code blocks, of related code and include\n",
"comments to explain the code. \\* creating reusuable environments for\n",
"Python workflows using tools like\n",
"docker\n",
"containers,\n",
"conda\n",
"environments, and\n",
"interactive notebooks\n",
"with binder.\n",
"\n",
"> **Open Reproducible Science - A Case Study**\n",
">\n",
"> Chaya is a scientist at Generic University, studying the role of\n",
"> invasive grasses on fires in grassland areas. She is building models\n",
"> of fire spread as they relate to vegetation cover. This model uses\n",
"> data collected from satellites that detect wildfires and also plant\n",
"> cover maps. After documenting that an invasive plant drastically\n",
"> alters fire spread rates, she is eager to share her findings with the\n",
"> world. Chaya uses scientific programming rather than a graphical user\n",
"> interface tool such as Excel to process her data and run the model to\n",
"> ensure that the process is automated. Chaya writes a manuscript on her\n",
"> findings. When she is ready to submit her article to a journal, she\n",
"> first posts a preprint of the article on a preprint server, stores\n",
"> relevant data in a data repository and releases her code on GitHub.\n",
"> This way, the research community can provide feedback on her work, the\n",
"> reviewers and others can reproduce her analysis, and she has\n",
"> established precedent for her findings.\n",
">\n",
"> In the first review of her paper, which is returned 3 months later,\n",
"> many changes are suggested which impact her final figures. Updating\n",
"> figures could be a tedious process. However, in this case, Chaya has\n",
"> developed these figures using the Python programming language. Thus,\n",
"> updating figures is easily done by modifying the processing methods\n",
"> used to create them. Further because she stored her data and code in a\n",
"> public repository on GitHub, it is easy and quick for Chaya three\n",
"> months later to find the original data and code that she used and to\n",
"> update the workflow as needed to produce the revised versions of her\n",
"> figures. Throughout the review process, the code (and perhaps data)\n",
"> are updated, and new versions of the code are tracked. Upon acceptance\n",
"> of the manuscript, the preprint can be updated, along with the code\n",
"> and data to ensure that the most recent version of the paper and\n",
"> analysis are openly available for anyone to use."
],
"id": "70c32faa-3868-4d68-ba4e-a1d51825315f"
}
],
"nbformat": 4,
"nbformat_minor": 5,
"metadata": {
"kernelspec": {
"name": "python3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"path": "/usr/share/miniconda/envs/learning-portal/share/jupyter/kernels/python3"
},
"language_info": {
"name": "python",
"codemirror_mode": {
"name": "ipython",
"version": "3"
},
"file_extension": ".py",
"mimetype": "text/x-python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.17"
}
}
}