Principles for data analysis workflows.
A systematic and reproducible "workflow"-the process that moves a scientific investigation from raw data to coherent research question to insightful contribution-should be a fundamental part of academic data-intensive research practice. In this paper, we elaborate basic principles of a rep...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2021-03-01
|
Series: | PLoS Computational Biology |
Online Access: | https://doi.org/10.1371/journal.pcbi.1008770 |
id |
doaj-69081b5ececd439e961766c0c1ab272b |
---|---|
record_format |
Article |
spelling |
doaj-69081b5ececd439e961766c0c1ab272b2021-07-29T04:34:27ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582021-03-01173e100877010.1371/journal.pcbi.1008770Principles for data analysis workflows.Sara StoudtVáleri N VásquezCiera C MartinezA systematic and reproducible "workflow"-the process that moves a scientific investigation from raw data to coherent research question to insightful contribution-should be a fundamental part of academic data-intensive research practice. In this paper, we elaborate basic principles of a reproducible data analysis workflow by defining 3 phases: the Explore, Refine, and Produce Phases. Each phase is roughly centered around the audience to whom research decisions, methodologies, and results are being immediately communicated. Importantly, each phase can also give rise to a number of research products beyond traditional academic publications. Where relevant, we draw analogies between design principles and established practice in software development. The guidance provided here is not intended to be a strict rulebook; rather, the suggestions for practices and tools to advance reproducible, sound data-intensive analysis may furnish support for both students new to research and current researchers who are new to data-intensive work.https://doi.org/10.1371/journal.pcbi.1008770 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Sara Stoudt Váleri N Vásquez Ciera C Martinez |
spellingShingle |
Sara Stoudt Váleri N Vásquez Ciera C Martinez Principles for data analysis workflows. PLoS Computational Biology |
author_facet |
Sara Stoudt Váleri N Vásquez Ciera C Martinez |
author_sort |
Sara Stoudt |
title |
Principles for data analysis workflows. |
title_short |
Principles for data analysis workflows. |
title_full |
Principles for data analysis workflows. |
title_fullStr |
Principles for data analysis workflows. |
title_full_unstemmed |
Principles for data analysis workflows. |
title_sort |
principles for data analysis workflows. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS Computational Biology |
issn |
1553-734X 1553-7358 |
publishDate |
2021-03-01 |
description |
A systematic and reproducible "workflow"-the process that moves a scientific investigation from raw data to coherent research question to insightful contribution-should be a fundamental part of academic data-intensive research practice. In this paper, we elaborate basic principles of a reproducible data analysis workflow by defining 3 phases: the Explore, Refine, and Produce Phases. Each phase is roughly centered around the audience to whom research decisions, methodologies, and results are being immediately communicated. Importantly, each phase can also give rise to a number of research products beyond traditional academic publications. Where relevant, we draw analogies between design principles and established practice in software development. The guidance provided here is not intended to be a strict rulebook; rather, the suggestions for practices and tools to advance reproducible, sound data-intensive analysis may furnish support for both students new to research and current researchers who are new to data-intensive work. |
url |
https://doi.org/10.1371/journal.pcbi.1008770 |
work_keys_str_mv |
AT sarastoudt principlesfordataanalysisworkflows AT valerinvasquez principlesfordataanalysisworkflows AT cieracmartinez principlesfordataanalysisworkflows |
_version_ |
1721259357444767744 |