Disk failures in the EOS setup at CERN

The EOS deployment at CERN is a core service used for both scientific data processing, analysis and as back-end for general end-user storage (eg home directories/CERNBOX). The collected disk failure metrics over a period of 1 year from a deployment size of some 70k disks allows a first systematic an...

Full description

Bibliographic Details
Main Authors: Duellmann Dirk, Portabales Alfonso
Format: Article
Language:English
Published: EDP Sciences 2019-01-01
Series:EPJ Web of Conferences
Online Access:https://www.epj-conferences.org/articles/epjconf/pdf/2019/19/epjconf_chep2018_04046.pdf
id doaj-ba1aba91cb0948f0bf84c2acbea0ccd0
record_format Article
spelling doaj-ba1aba91cb0948f0bf84c2acbea0ccd02021-08-02T10:03:34ZengEDP SciencesEPJ Web of Conferences2100-014X2019-01-012140404610.1051/epjconf/201921404046epjconf_chep2018_04046Disk failures in the EOS setup at CERNDuellmann DirkPortabales AlfonsoThe EOS deployment at CERN is a core service used for both scientific data processing, analysis and as back-end for general end-user storage (eg home directories/CERNBOX). The collected disk failure metrics over a period of 1 year from a deployment size of some 70k disks allows a first systematic analysis of the behaviour of different hard disk types for the large CERN usecases. In this contribution we describe the data collection and analysis, summarise the measured rates and compare them with other large disk deployments. We further describe initial steps to use the collected failure and SMART metrics to develop a machine learning model predicting imminent failures and hence avoid service degradation and repair costs.https://www.epj-conferences.org/articles/epjconf/pdf/2019/19/epjconf_chep2018_04046.pdf
collection DOAJ
language English
format Article
sources DOAJ
author Duellmann Dirk
Portabales Alfonso
spellingShingle Duellmann Dirk
Portabales Alfonso
Disk failures in the EOS setup at CERN
EPJ Web of Conferences
author_facet Duellmann Dirk
Portabales Alfonso
author_sort Duellmann Dirk
title Disk failures in the EOS setup at CERN
title_short Disk failures in the EOS setup at CERN
title_full Disk failures in the EOS setup at CERN
title_fullStr Disk failures in the EOS setup at CERN
title_full_unstemmed Disk failures in the EOS setup at CERN
title_sort disk failures in the eos setup at cern
publisher EDP Sciences
series EPJ Web of Conferences
issn 2100-014X
publishDate 2019-01-01
description The EOS deployment at CERN is a core service used for both scientific data processing, analysis and as back-end for general end-user storage (eg home directories/CERNBOX). The collected disk failure metrics over a period of 1 year from a deployment size of some 70k disks allows a first systematic analysis of the behaviour of different hard disk types for the large CERN usecases. In this contribution we describe the data collection and analysis, summarise the measured rates and compare them with other large disk deployments. We further describe initial steps to use the collected failure and SMART metrics to develop a machine learning model predicting imminent failures and hence avoid service degradation and repair costs.
url https://www.epj-conferences.org/articles/epjconf/pdf/2019/19/epjconf_chep2018_04046.pdf
work_keys_str_mv AT duellmanndirk diskfailuresintheeossetupatcern
AT portabalesalfonso diskfailuresintheeossetupatcern
_version_ 1721234250821271552