Evolution of the ROOT Tree I/O

The ROOT TTree data format encodes hundreds of petabytes of High Energy and Nuclear Physics events. Its columnar layout drives rapid analyses, as only those parts (“branches”) that are really used in a given analysis need to be read from storage. Its unique feature is the seamless C++ integration, w...

Full description

Bibliographic Details
Main Authors: Blomer Jakob, Canal Philippe, Naumann Axel, Piparo Danilo
Format: Article
Language:English
Published: EDP Sciences 2020-01-01
Series:EPJ Web of Conferences
Online Access:https://www.epj-conferences.org/articles/epjconf/pdf/2020/21/epjconf_chep2020_02030.pdf
id doaj-fe3ffdf8a5154600abedf34d52124d18
record_format Article
spelling doaj-fe3ffdf8a5154600abedf34d52124d182021-08-02T16:10:57ZengEDP SciencesEPJ Web of Conferences2100-014X2020-01-012450203010.1051/epjconf/202024502030epjconf_chep2020_02030Evolution of the ROOT Tree I/OBlomer Jakob0Canal Philippe1Naumann Axel2Piparo Danilo3CERNFermilabCERNCERNThe ROOT TTree data format encodes hundreds of petabytes of High Energy and Nuclear Physics events. Its columnar layout drives rapid analyses, as only those parts (“branches”) that are really used in a given analysis need to be read from storage. Its unique feature is the seamless C++ integration, which allows users to directly store their event classes without explicitly defining data schemas. In this contribution, we present the status and plans of the future ROOT 7 event I/O. Along with the ROOT 7 interface modernization, we aim for robust, where possible compile-time safe C++ interfaces to read and write event data. On the performance side, we show first benchmarks using ROOT’s new experimental I/O subsystem that combines the best of TTrees with recent advances in columnar data formats. A core ingredient is a strong separation of the high-level logical data layout (C++ classes) from the low-level physical data layout (storage backed nested vectors of simple types). We show how the new, optimized physical data layout speeds up serialization and deserialization and facilitates parallel, vectorized and bulk operations. This lets ROOT I/O run optimally on the upcoming ultra-fast NVRAM storage devices, as well as file-less storage systems such as object stores.https://www.epj-conferences.org/articles/epjconf/pdf/2020/21/epjconf_chep2020_02030.pdf
collection DOAJ
language English
format Article
sources DOAJ
author Blomer Jakob
Canal Philippe
Naumann Axel
Piparo Danilo
spellingShingle Blomer Jakob
Canal Philippe
Naumann Axel
Piparo Danilo
Evolution of the ROOT Tree I/O
EPJ Web of Conferences
author_facet Blomer Jakob
Canal Philippe
Naumann Axel
Piparo Danilo
author_sort Blomer Jakob
title Evolution of the ROOT Tree I/O
title_short Evolution of the ROOT Tree I/O
title_full Evolution of the ROOT Tree I/O
title_fullStr Evolution of the ROOT Tree I/O
title_full_unstemmed Evolution of the ROOT Tree I/O
title_sort evolution of the root tree i/o
publisher EDP Sciences
series EPJ Web of Conferences
issn 2100-014X
publishDate 2020-01-01
description The ROOT TTree data format encodes hundreds of petabytes of High Energy and Nuclear Physics events. Its columnar layout drives rapid analyses, as only those parts (“branches”) that are really used in a given analysis need to be read from storage. Its unique feature is the seamless C++ integration, which allows users to directly store their event classes without explicitly defining data schemas. In this contribution, we present the status and plans of the future ROOT 7 event I/O. Along with the ROOT 7 interface modernization, we aim for robust, where possible compile-time safe C++ interfaces to read and write event data. On the performance side, we show first benchmarks using ROOT’s new experimental I/O subsystem that combines the best of TTrees with recent advances in columnar data formats. A core ingredient is a strong separation of the high-level logical data layout (C++ classes) from the low-level physical data layout (storage backed nested vectors of simple types). We show how the new, optimized physical data layout speeds up serialization and deserialization and facilitates parallel, vectorized and bulk operations. This lets ROOT I/O run optimally on the upcoming ultra-fast NVRAM storage devices, as well as file-less storage systems such as object stores.
url https://www.epj-conferences.org/articles/epjconf/pdf/2020/21/epjconf_chep2020_02030.pdf
work_keys_str_mv AT blomerjakob evolutionoftheroottreeio
AT canalphilippe evolutionoftheroottreeio
AT naumannaxel evolutionoftheroottreeio
AT piparodanilo evolutionoftheroottreeio
_version_ 1721229971409600512