Design of a Resilient, High-Throughput, Persistent Storage System for the ATLAS Phase-II DAQ System

The ATLAS experiment will undergo a major upgrade to take advantage of the new conditions provided by the upgraded High-Luminosity LHC. The Trigger and Data Acquisition system (TDAQ) will record data at unprecedented rates: the detectors will be read out at 1 MHz generating around 5 TB/s of data. Th...

Full description

Bibliographic Details
Main Authors: Abed Abud Adam, Bonaventura Matias, Farina Edoardo, Le Goff Fabrice
Format: Article
Language:English
Published: EDP Sciences 2021-01-01
Series:EPJ Web of Conferences
Online Access:https://www.epj-conferences.org/articles/epjconf/pdf/2021/05/epjconf_chep2021_04014.pdf
id doaj-80a8fc2f56974f13994245227665be50
record_format Article
spelling doaj-80a8fc2f56974f13994245227665be502021-08-26T09:27:32ZengEDP SciencesEPJ Web of Conferences2100-014X2021-01-012510401410.1051/epjconf/202125104014epjconf_chep2021_04014Design of a Resilient, High-Throughput, Persistent Storage System for the ATLAS Phase-II DAQ SystemAbed Abud AdamBonaventura MatiasFarina EdoardoLe Goff Fabrice0European Laboratory for Particle Physics (CERN)The ATLAS experiment will undergo a major upgrade to take advantage of the new conditions provided by the upgraded High-Luminosity LHC. The Trigger and Data Acquisition system (TDAQ) will record data at unprecedented rates: the detectors will be read out at 1 MHz generating around 5 TB/s of data. The Dataflow system (DF), component of TDAQ, introduces a novel design: readout data are buffered on persistent storage while the event filtering system analyses them to select 10000 events per second for a total recorded throughput of around 60 GB/s. This approach allows for decoupling the detector activity from the event selection process. New challenges then arise for DF: design and implement a distributed, reliable, persistent storage system supporting several TB/s of aggregated throughput while providing tens of PB of capacity. In this paper we first describe some of the challenges that DF is facing: data safety with persistent storage limitations, indexing of data at high-granularity in a highly-distributed system, and high-performance management of storage capacity. Then the ongoing R&D to address each of the them is presented and the performance achieved with a working prototype is shown.https://www.epj-conferences.org/articles/epjconf/pdf/2021/05/epjconf_chep2021_04014.pdf
collection DOAJ
language English
format Article
sources DOAJ
author Abed Abud Adam
Bonaventura Matias
Farina Edoardo
Le Goff Fabrice
spellingShingle Abed Abud Adam
Bonaventura Matias
Farina Edoardo
Le Goff Fabrice
Design of a Resilient, High-Throughput, Persistent Storage System for the ATLAS Phase-II DAQ System
EPJ Web of Conferences
author_facet Abed Abud Adam
Bonaventura Matias
Farina Edoardo
Le Goff Fabrice
author_sort Abed Abud Adam
title Design of a Resilient, High-Throughput, Persistent Storage System for the ATLAS Phase-II DAQ System
title_short Design of a Resilient, High-Throughput, Persistent Storage System for the ATLAS Phase-II DAQ System
title_full Design of a Resilient, High-Throughput, Persistent Storage System for the ATLAS Phase-II DAQ System
title_fullStr Design of a Resilient, High-Throughput, Persistent Storage System for the ATLAS Phase-II DAQ System
title_full_unstemmed Design of a Resilient, High-Throughput, Persistent Storage System for the ATLAS Phase-II DAQ System
title_sort design of a resilient, high-throughput, persistent storage system for the atlas phase-ii daq system
publisher EDP Sciences
series EPJ Web of Conferences
issn 2100-014X
publishDate 2021-01-01
description The ATLAS experiment will undergo a major upgrade to take advantage of the new conditions provided by the upgraded High-Luminosity LHC. The Trigger and Data Acquisition system (TDAQ) will record data at unprecedented rates: the detectors will be read out at 1 MHz generating around 5 TB/s of data. The Dataflow system (DF), component of TDAQ, introduces a novel design: readout data are buffered on persistent storage while the event filtering system analyses them to select 10000 events per second for a total recorded throughput of around 60 GB/s. This approach allows for decoupling the detector activity from the event selection process. New challenges then arise for DF: design and implement a distributed, reliable, persistent storage system supporting several TB/s of aggregated throughput while providing tens of PB of capacity. In this paper we first describe some of the challenges that DF is facing: data safety with persistent storage limitations, indexing of data at high-granularity in a highly-distributed system, and high-performance management of storage capacity. Then the ongoing R&D to address each of the them is presented and the performance achieved with a working prototype is shown.
url https://www.epj-conferences.org/articles/epjconf/pdf/2021/05/epjconf_chep2021_04014.pdf
work_keys_str_mv AT abedabudadam designofaresilienthighthroughputpersistentstoragesystemfortheatlasphaseiidaqsystem
AT bonaventuramatias designofaresilienthighthroughputpersistentstoragesystemfortheatlasphaseiidaqsystem
AT farinaedoardo designofaresilienthighthroughputpersistentstoragesystemfortheatlasphaseiidaqsystem
AT legofffabrice designofaresilienthighthroughputpersistentstoragesystemfortheatlasphaseiidaqsystem
_version_ 1721195799646306304