eddy4R 0.2.0: a DevOps model for community-extensible processing and analysis of eddy-covariance data based on R, Git, Docker, and HDF5

Large differences in instrumentation, site setup, data format, and operating system stymie the adoption of a universal computational environment for processing and analyzing eddy-covariance (EC) data. This results in limited software applicability and extensibility in addition to often substantia...

Full description

Bibliographic Details
Main Authors: S. Metzger, D. Durden, C. Sturtevant, H. Luo, N. Pingintha-Durden, T. Sachs, A. Serafimovich, J. Hartmann, J. Li, K. Xu, A. R. Desai
Format: Article
Language:English
Published: Copernicus Publications 2017-08-01
Series:Geoscientific Model Development
Online Access:https://www.geosci-model-dev.net/10/3189/2017/gmd-10-3189-2017.pdf
id doaj-52ad2aa91ddc47298617b9648147b55a
record_format Article
spelling doaj-52ad2aa91ddc47298617b9648147b55a2020-11-24T22:41:44ZengCopernicus PublicationsGeoscientific Model Development1991-959X1991-96032017-08-01103189320610.5194/gmd-10-3189-2017eddy4R 0.2.0: a DevOps model for community-extensible processing and analysis of eddy-covariance data based on R, Git, Docker, and HDF5S. Metzger0S. Metzger1D. Durden2C. Sturtevant3H. Luo4N. Pingintha-Durden5T. Sachs6A. Serafimovich7J. Hartmann8J. Li9K. Xu10A. R. Desai11National Ecological Observatory Network, Battelle, 1685 38th Street, Boulder, CO 80301, USAUniversity of Wisconsin-Madison, Dept. of Atmospheric and Oceanic Sciences, 1225 West Dayton Street, Madison, WI 53706, USANational Ecological Observatory Network, Battelle, 1685 38th Street, Boulder, CO 80301, USANational Ecological Observatory Network, Battelle, 1685 38th Street, Boulder, CO 80301, USANational Ecological Observatory Network, Battelle, 1685 38th Street, Boulder, CO 80301, USANational Ecological Observatory Network, Battelle, 1685 38th Street, Boulder, CO 80301, USAGFZ German Research Centre for Geosciences, Telegrafenberg, 14473 Potsdam, GermanyGFZ German Research Centre for Geosciences, Telegrafenberg, 14473 Potsdam, GermanyAlfred Wegener Institute – Helmholtz Centre for Polar and Marine Research, Am Handelshafen 12, 27570 Bremerhaven, GermanyLI-COR Biosciences, 4647 Superior Street, Lincoln, NE 68504, USAUniversity of Wisconsin-Madison, Dept. of Atmospheric and Oceanic Sciences, 1225 West Dayton Street, Madison, WI 53706, USAUniversity of Wisconsin-Madison, Dept. of Atmospheric and Oceanic Sciences, 1225 West Dayton Street, Madison, WI 53706, USALarge differences in instrumentation, site setup, data format, and operating system stymie the adoption of a universal computational environment for processing and analyzing eddy-covariance (EC) data. This results in limited software applicability and extensibility in addition to often substantial inconsistencies in flux estimates. Addressing these concerns, this paper presents the systematic development of portable, reproducible, and extensible EC software achieved by adopting a development and systems operation (DevOps) approach. This software development model is used for the creation of the eddy4R family of EC code packages in the open-source R language for statistical computing. These packages are community developed, iterated via the Git distributed version control system, and wrapped into a portable and reproducible Docker filesystem that is independent of the underlying host operating system. The HDF5 hierarchical data format then provides a streamlined mechanism for highly compressed and fully self-documented data ingest and output. <br><br> The usefulness of the DevOps approach was evaluated for three test applications. First, the resultant EC processing software was used to analyze standard flux tower data from the first EC instruments installed at a National Ecological Observatory (NEON) field site. Second, through an aircraft test application, we demonstrate the modular extensibility of eddy4R to analyze EC data from other platforms. Third, an intercomparison with commercial-grade software showed excellent agreement (<i>R</i><sup>2</sup>  =  1.0 for CO<sub>2</sub> flux). In conjunction with this study, a Docker image containing the first two eddy4R packages and an executable example workflow, as well as first NEON EC data products are released publicly. We conclude by describing the work remaining to arrive at the automated generation of science-grade EC fluxes and benefits to the science community at large. <br><br> This software development model is applicable beyond EC and more generally builds the capacity to deploy complex algorithms developed by scientists in an efficient and scalable manner. In addition, modularity permits meeting project milestones while retaining extensibility with time.https://www.geosci-model-dev.net/10/3189/2017/gmd-10-3189-2017.pdf
collection DOAJ
language English
format Article
sources DOAJ
author S. Metzger
S. Metzger
D. Durden
C. Sturtevant
H. Luo
N. Pingintha-Durden
T. Sachs
A. Serafimovich
J. Hartmann
J. Li
K. Xu
A. R. Desai
spellingShingle S. Metzger
S. Metzger
D. Durden
C. Sturtevant
H. Luo
N. Pingintha-Durden
T. Sachs
A. Serafimovich
J. Hartmann
J. Li
K. Xu
A. R. Desai
eddy4R 0.2.0: a DevOps model for community-extensible processing and analysis of eddy-covariance data based on R, Git, Docker, and HDF5
Geoscientific Model Development
author_facet S. Metzger
S. Metzger
D. Durden
C. Sturtevant
H. Luo
N. Pingintha-Durden
T. Sachs
A. Serafimovich
J. Hartmann
J. Li
K. Xu
A. R. Desai
author_sort S. Metzger
title eddy4R 0.2.0: a DevOps model for community-extensible processing and analysis of eddy-covariance data based on R, Git, Docker, and HDF5
title_short eddy4R 0.2.0: a DevOps model for community-extensible processing and analysis of eddy-covariance data based on R, Git, Docker, and HDF5
title_full eddy4R 0.2.0: a DevOps model for community-extensible processing and analysis of eddy-covariance data based on R, Git, Docker, and HDF5
title_fullStr eddy4R 0.2.0: a DevOps model for community-extensible processing and analysis of eddy-covariance data based on R, Git, Docker, and HDF5
title_full_unstemmed eddy4R 0.2.0: a DevOps model for community-extensible processing and analysis of eddy-covariance data based on R, Git, Docker, and HDF5
title_sort eddy4r 0.2.0: a devops model for community-extensible processing and analysis of eddy-covariance data based on r, git, docker, and hdf5
publisher Copernicus Publications
series Geoscientific Model Development
issn 1991-959X
1991-9603
publishDate 2017-08-01
description Large differences in instrumentation, site setup, data format, and operating system stymie the adoption of a universal computational environment for processing and analyzing eddy-covariance (EC) data. This results in limited software applicability and extensibility in addition to often substantial inconsistencies in flux estimates. Addressing these concerns, this paper presents the systematic development of portable, reproducible, and extensible EC software achieved by adopting a development and systems operation (DevOps) approach. This software development model is used for the creation of the eddy4R family of EC code packages in the open-source R language for statistical computing. These packages are community developed, iterated via the Git distributed version control system, and wrapped into a portable and reproducible Docker filesystem that is independent of the underlying host operating system. The HDF5 hierarchical data format then provides a streamlined mechanism for highly compressed and fully self-documented data ingest and output. <br><br> The usefulness of the DevOps approach was evaluated for three test applications. First, the resultant EC processing software was used to analyze standard flux tower data from the first EC instruments installed at a National Ecological Observatory (NEON) field site. Second, through an aircraft test application, we demonstrate the modular extensibility of eddy4R to analyze EC data from other platforms. Third, an intercomparison with commercial-grade software showed excellent agreement (<i>R</i><sup>2</sup>  =  1.0 for CO<sub>2</sub> flux). In conjunction with this study, a Docker image containing the first two eddy4R packages and an executable example workflow, as well as first NEON EC data products are released publicly. We conclude by describing the work remaining to arrive at the automated generation of science-grade EC fluxes and benefits to the science community at large. <br><br> This software development model is applicable beyond EC and more generally builds the capacity to deploy complex algorithms developed by scientists in an efficient and scalable manner. In addition, modularity permits meeting project milestones while retaining extensibility with time.
url https://www.geosci-model-dev.net/10/3189/2017/gmd-10-3189-2017.pdf
work_keys_str_mv AT smetzger eddy4r020adevopsmodelforcommunityextensibleprocessingandanalysisofeddycovariancedatabasedonrgitdockerandhdf5
AT smetzger eddy4r020adevopsmodelforcommunityextensibleprocessingandanalysisofeddycovariancedatabasedonrgitdockerandhdf5
AT ddurden eddy4r020adevopsmodelforcommunityextensibleprocessingandanalysisofeddycovariancedatabasedonrgitdockerandhdf5
AT csturtevant eddy4r020adevopsmodelforcommunityextensibleprocessingandanalysisofeddycovariancedatabasedonrgitdockerandhdf5
AT hluo eddy4r020adevopsmodelforcommunityextensibleprocessingandanalysisofeddycovariancedatabasedonrgitdockerandhdf5
AT npinginthadurden eddy4r020adevopsmodelforcommunityextensibleprocessingandanalysisofeddycovariancedatabasedonrgitdockerandhdf5
AT tsachs eddy4r020adevopsmodelforcommunityextensibleprocessingandanalysisofeddycovariancedatabasedonrgitdockerandhdf5
AT aserafimovich eddy4r020adevopsmodelforcommunityextensibleprocessingandanalysisofeddycovariancedatabasedonrgitdockerandhdf5
AT jhartmann eddy4r020adevopsmodelforcommunityextensibleprocessingandanalysisofeddycovariancedatabasedonrgitdockerandhdf5
AT jli eddy4r020adevopsmodelforcommunityextensibleprocessingandanalysisofeddycovariancedatabasedonrgitdockerandhdf5
AT kxu eddy4r020adevopsmodelforcommunityextensibleprocessingandanalysisofeddycovariancedatabasedonrgitdockerandhdf5
AT ardesai eddy4r020adevopsmodelforcommunityextensibleprocessingandanalysisofeddycovariancedatabasedonrgitdockerandhdf5
_version_ 1725701027942891520