Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds

We describe a novel approach for experimental High-Energy Physics (HEP) data analyses that is centred around the declarative rather than imperative paradigm when describing analysis computational tasks. The analysis process can be structured in the form of a Directed Acyclic Graph (DAG), where each...

Full description

Bibliographic Details
Main Authors: Tibor Šimko, Lukas Alexander Heinrich, Clemens Lange, Adelina Eleonora Lintuluoto, Danika Marina MacDonell, Audrius Mečionis, Diego Rodríguez Rodríguez, Parth Shandilya, Marco Vidal García
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-05-01
Series:Frontiers in Big Data
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fdata.2021.661501/full
id doaj-4dab27bdc65944d28fdfd79dd55aded4
record_format Article
spelling doaj-4dab27bdc65944d28fdfd79dd55aded42021-05-07T09:55:25ZengFrontiers Media S.A.Frontiers in Big Data2624-909X2021-05-01410.3389/fdata.2021.661501661501Scalable Declarative HEP Analysis Workflows for Containerised Compute CloudsTibor Šimko0Lukas Alexander Heinrich1Clemens Lange2Adelina Eleonora Lintuluoto3Adelina Eleonora Lintuluoto4Danika Marina MacDonell5Audrius Mečionis6Diego Rodríguez Rodríguez7Parth Shandilya8Parth Shandilya9Marco Vidal García10CERN, Geneva, SwitzerlandCERN, Geneva, SwitzerlandCERN, Geneva, SwitzerlandCERN, Geneva, SwitzerlandDepartment of Physics, University of Helsinki, Helsinki, FinlandDepartment of Physics & Astronomy, University of Victoria, Victoria, BC, CanadaCERN, Geneva, SwitzerlandCERN, Geneva, SwitzerlandCERN, Geneva, SwitzerlandThe LNM Institute of Information Technology, Jaipur, IndiaCERN, Geneva, SwitzerlandWe describe a novel approach for experimental High-Energy Physics (HEP) data analyses that is centred around the declarative rather than imperative paradigm when describing analysis computational tasks. The analysis process can be structured in the form of a Directed Acyclic Graph (DAG), where each graph vertex represents a unit of computation with its inputs and outputs, and the graph edges describe the interconnection of various computational steps. We have developed REANA, a platform for reproducible data analyses, that supports several such DAG workflow specifications. The REANA platform parses the analysis workflow and dispatches its computational steps to various supported computing backends (Kubernetes, HTCondor, Slurm). The focus on declarative rather than imperative programming enables researchers to concentrate on the problem domain at hand without having to think about implementation details such as scalable job orchestration. The declarative programming approach is further exemplified by a multi-level job cascading paradigm that was implemented in the Yadage workflow specification language. We present two recent LHC particle physics analyses, ATLAS searches for dark matter and CMS jet energy correction pipelines, where the declarative approach was successfully applied. We argue that the declarative approach to data analyses, combined with recent advancements in container technology, facilitates the portability of computational data analyses to various compute backends, enhancing the reproducibility and the knowledge preservation behind particle physics data analyses.https://www.frontiersin.org/articles/10.3389/fdata.2021.661501/fullcomputational workflowsreproducibilityscalabilitydeclarative programminganalysis preservation
collection DOAJ
language English
format Article
sources DOAJ
author Tibor Šimko
Lukas Alexander Heinrich
Clemens Lange
Adelina Eleonora Lintuluoto
Adelina Eleonora Lintuluoto
Danika Marina MacDonell
Audrius Mečionis
Diego Rodríguez Rodríguez
Parth Shandilya
Parth Shandilya
Marco Vidal García
spellingShingle Tibor Šimko
Lukas Alexander Heinrich
Clemens Lange
Adelina Eleonora Lintuluoto
Adelina Eleonora Lintuluoto
Danika Marina MacDonell
Audrius Mečionis
Diego Rodríguez Rodríguez
Parth Shandilya
Parth Shandilya
Marco Vidal García
Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds
Frontiers in Big Data
computational workflows
reproducibility
scalability
declarative programming
analysis preservation
author_facet Tibor Šimko
Lukas Alexander Heinrich
Clemens Lange
Adelina Eleonora Lintuluoto
Adelina Eleonora Lintuluoto
Danika Marina MacDonell
Audrius Mečionis
Diego Rodríguez Rodríguez
Parth Shandilya
Parth Shandilya
Marco Vidal García
author_sort Tibor Šimko
title Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds
title_short Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds
title_full Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds
title_fullStr Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds
title_full_unstemmed Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds
title_sort scalable declarative hep analysis workflows for containerised compute clouds
publisher Frontiers Media S.A.
series Frontiers in Big Data
issn 2624-909X
publishDate 2021-05-01
description We describe a novel approach for experimental High-Energy Physics (HEP) data analyses that is centred around the declarative rather than imperative paradigm when describing analysis computational tasks. The analysis process can be structured in the form of a Directed Acyclic Graph (DAG), where each graph vertex represents a unit of computation with its inputs and outputs, and the graph edges describe the interconnection of various computational steps. We have developed REANA, a platform for reproducible data analyses, that supports several such DAG workflow specifications. The REANA platform parses the analysis workflow and dispatches its computational steps to various supported computing backends (Kubernetes, HTCondor, Slurm). The focus on declarative rather than imperative programming enables researchers to concentrate on the problem domain at hand without having to think about implementation details such as scalable job orchestration. The declarative programming approach is further exemplified by a multi-level job cascading paradigm that was implemented in the Yadage workflow specification language. We present two recent LHC particle physics analyses, ATLAS searches for dark matter and CMS jet energy correction pipelines, where the declarative approach was successfully applied. We argue that the declarative approach to data analyses, combined with recent advancements in container technology, facilitates the portability of computational data analyses to various compute backends, enhancing the reproducibility and the knowledge preservation behind particle physics data analyses.
topic computational workflows
reproducibility
scalability
declarative programming
analysis preservation
url https://www.frontiersin.org/articles/10.3389/fdata.2021.661501/full
work_keys_str_mv AT tiborsimko scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds
AT lukasalexanderheinrich scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds
AT clemenslange scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds
AT adelinaeleonoralintuluoto scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds
AT adelinaeleonoralintuluoto scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds
AT danikamarinamacdonell scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds
AT audriusmecionis scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds
AT diegorodriguezrodriguez scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds
AT parthshandilya scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds
AT parthshandilya scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds
AT marcovidalgarcia scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds
_version_ 1721455563581161472