Principles for data analysis workflows.

A systematic and reproducible "workflow"-the process that moves a scientific investigation from raw data to coherent research question to insightful contribution-should be a fundamental part of academic data-intensive research practice. In this paper, we elaborate basic principles of a rep...

Full description

Bibliographic Details
Main Authors: Sara Stoudt, Váleri N Vásquez, Ciera C Martinez
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2021-03-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1008770
id doaj-69081b5ececd439e961766c0c1ab272b
record_format Article
spelling doaj-69081b5ececd439e961766c0c1ab272b2021-07-29T04:34:27ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582021-03-01173e100877010.1371/journal.pcbi.1008770Principles for data analysis workflows.Sara StoudtVáleri N VásquezCiera C MartinezA systematic and reproducible "workflow"-the process that moves a scientific investigation from raw data to coherent research question to insightful contribution-should be a fundamental part of academic data-intensive research practice. In this paper, we elaborate basic principles of a reproducible data analysis workflow by defining 3 phases: the Explore, Refine, and Produce Phases. Each phase is roughly centered around the audience to whom research decisions, methodologies, and results are being immediately communicated. Importantly, each phase can also give rise to a number of research products beyond traditional academic publications. Where relevant, we draw analogies between design principles and established practice in software development. The guidance provided here is not intended to be a strict rulebook; rather, the suggestions for practices and tools to advance reproducible, sound data-intensive analysis may furnish support for both students new to research and current researchers who are new to data-intensive work.https://doi.org/10.1371/journal.pcbi.1008770
collection DOAJ
language English
format Article
sources DOAJ
author Sara Stoudt
Váleri N Vásquez
Ciera C Martinez
spellingShingle Sara Stoudt
Váleri N Vásquez
Ciera C Martinez
Principles for data analysis workflows.
PLoS Computational Biology
author_facet Sara Stoudt
Váleri N Vásquez
Ciera C Martinez
author_sort Sara Stoudt
title Principles for data analysis workflows.
title_short Principles for data analysis workflows.
title_full Principles for data analysis workflows.
title_fullStr Principles for data analysis workflows.
title_full_unstemmed Principles for data analysis workflows.
title_sort principles for data analysis workflows.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2021-03-01
description A systematic and reproducible "workflow"-the process that moves a scientific investigation from raw data to coherent research question to insightful contribution-should be a fundamental part of academic data-intensive research practice. In this paper, we elaborate basic principles of a reproducible data analysis workflow by defining 3 phases: the Explore, Refine, and Produce Phases. Each phase is roughly centered around the audience to whom research decisions, methodologies, and results are being immediately communicated. Importantly, each phase can also give rise to a number of research products beyond traditional academic publications. Where relevant, we draw analogies between design principles and established practice in software development. The guidance provided here is not intended to be a strict rulebook; rather, the suggestions for practices and tools to advance reproducible, sound data-intensive analysis may furnish support for both students new to research and current researchers who are new to data-intensive work.
url https://doi.org/10.1371/journal.pcbi.1008770
work_keys_str_mv AT sarastoudt principlesfordataanalysisworkflows
AT valerinvasquez principlesfordataanalysisworkflows
AT cieracmartinez principlesfordataanalysisworkflows
_version_ 1721259357444767744