Methods of Data Popularity Evaluation in the ATLAS Experiment at the LHC

The ATLAS Experiment at the LHC generates petabytes of data that is distributed among 160 computing sites all over the world and is processed continuously by various central production and user analysis tasks. The popularity of data is typically measured as the number of accesses and plays an import...

Full description

Bibliographic Details
Main Authors: Beermann Thomas, Chuchuk Olga, Di Girolamo Alessandro, Grigorieva Maria, Klimentov Alexei, Lassnig Mario, Schulz Markus, Sciaba Andrea, Tretyakov Eugeny
Format: Article
Language:English
Published: EDP Sciences 2021-01-01
Series:EPJ Web of Conferences
Online Access:https://www.epj-conferences.org/articles/epjconf/pdf/2021/05/epjconf_chep2021_02013.pdf
id doaj-a8d33560ddc24360b0e4ae5eb8e675f4
record_format Article
spelling doaj-a8d33560ddc24360b0e4ae5eb8e675f42021-08-26T09:27:32ZengEDP SciencesEPJ Web of Conferences2100-014X2021-01-012510201310.1051/epjconf/202125102013epjconf_chep2021_02013Methods of Data Popularity Evaluation in the ATLAS Experiment at the LHCBeermann Thomas0Chuchuk OlgaDi Girolamo Alessandro1Grigorieva MariaKlimentov Alexei2Lassnig Mario3Schulz Markus4Sciaba Andrea5Tretyakov EugenyBergische Universitaet WuppertalCERNBrookhaven National LaboratoryCERNCERNCERNThe ATLAS Experiment at the LHC generates petabytes of data that is distributed among 160 computing sites all over the world and is processed continuously by various central production and user analysis tasks. The popularity of data is typically measured as the number of accesses and plays an important role in resolving data management issues: deleting, replicating, moving between tapes, disks and caches. These data management procedures were still carried out in a semi-manual mode and now we have focused our efforts on automating it, making use of the historical knowledge about existing data management strategies. In this study we describe sources of information about data popularity and demonstrate their consistency. Based on the calculated popularity measurements, various distributions were obtained. Auxiliary information about replication and task processing allowed us to evaluate the correspondence between the number of tasks with popular data executed per site and the number of replicas per site. We also examine the popularity of user analysis data that is much less predictable than in the central production and requires more indicators than just the number of accesses.https://www.epj-conferences.org/articles/epjconf/pdf/2021/05/epjconf_chep2021_02013.pdf
collection DOAJ
language English
format Article
sources DOAJ
author Beermann Thomas
Chuchuk Olga
Di Girolamo Alessandro
Grigorieva Maria
Klimentov Alexei
Lassnig Mario
Schulz Markus
Sciaba Andrea
Tretyakov Eugeny
spellingShingle Beermann Thomas
Chuchuk Olga
Di Girolamo Alessandro
Grigorieva Maria
Klimentov Alexei
Lassnig Mario
Schulz Markus
Sciaba Andrea
Tretyakov Eugeny
Methods of Data Popularity Evaluation in the ATLAS Experiment at the LHC
EPJ Web of Conferences
author_facet Beermann Thomas
Chuchuk Olga
Di Girolamo Alessandro
Grigorieva Maria
Klimentov Alexei
Lassnig Mario
Schulz Markus
Sciaba Andrea
Tretyakov Eugeny
author_sort Beermann Thomas
title Methods of Data Popularity Evaluation in the ATLAS Experiment at the LHC
title_short Methods of Data Popularity Evaluation in the ATLAS Experiment at the LHC
title_full Methods of Data Popularity Evaluation in the ATLAS Experiment at the LHC
title_fullStr Methods of Data Popularity Evaluation in the ATLAS Experiment at the LHC
title_full_unstemmed Methods of Data Popularity Evaluation in the ATLAS Experiment at the LHC
title_sort methods of data popularity evaluation in the atlas experiment at the lhc
publisher EDP Sciences
series EPJ Web of Conferences
issn 2100-014X
publishDate 2021-01-01
description The ATLAS Experiment at the LHC generates petabytes of data that is distributed among 160 computing sites all over the world and is processed continuously by various central production and user analysis tasks. The popularity of data is typically measured as the number of accesses and plays an important role in resolving data management issues: deleting, replicating, moving between tapes, disks and caches. These data management procedures were still carried out in a semi-manual mode and now we have focused our efforts on automating it, making use of the historical knowledge about existing data management strategies. In this study we describe sources of information about data popularity and demonstrate their consistency. Based on the calculated popularity measurements, various distributions were obtained. Auxiliary information about replication and task processing allowed us to evaluate the correspondence between the number of tasks with popular data executed per site and the number of replicas per site. We also examine the popularity of user analysis data that is much less predictable than in the central production and requires more indicators than just the number of accesses.
url https://www.epj-conferences.org/articles/epjconf/pdf/2021/05/epjconf_chep2021_02013.pdf
work_keys_str_mv AT beermannthomas methodsofdatapopularityevaluationintheatlasexperimentatthelhc
AT chuchukolga methodsofdatapopularityevaluationintheatlasexperimentatthelhc
AT digirolamoalessandro methodsofdatapopularityevaluationintheatlasexperimentatthelhc
AT grigorievamaria methodsofdatapopularityevaluationintheatlasexperimentatthelhc
AT klimentovalexei methodsofdatapopularityevaluationintheatlasexperimentatthelhc
AT lassnigmario methodsofdatapopularityevaluationintheatlasexperimentatthelhc
AT schulzmarkus methodsofdatapopularityevaluationintheatlasexperimentatthelhc
AT sciabaandrea methodsofdatapopularityevaluationintheatlasexperimentatthelhc
AT tretyakoveugeny methodsofdatapopularityevaluationintheatlasexperimentatthelhc
_version_ 1721195816947810304