Disk failures in the EOS setup at CERN
The EOS deployment at CERN is a core service used for both scientific data processing, analysis and as back-end for general end-user storage (eg home directories/CERNBOX). The collected disk failure metrics over a period of 1 year from a deployment size of some 70k disks allows a first systematic an...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
EDP Sciences
2019-01-01
|
Series: | EPJ Web of Conferences |
Online Access: | https://www.epj-conferences.org/articles/epjconf/pdf/2019/19/epjconf_chep2018_04046.pdf |
id |
doaj-ba1aba91cb0948f0bf84c2acbea0ccd0 |
---|---|
record_format |
Article |
spelling |
doaj-ba1aba91cb0948f0bf84c2acbea0ccd02021-08-02T10:03:34ZengEDP SciencesEPJ Web of Conferences2100-014X2019-01-012140404610.1051/epjconf/201921404046epjconf_chep2018_04046Disk failures in the EOS setup at CERNDuellmann DirkPortabales AlfonsoThe EOS deployment at CERN is a core service used for both scientific data processing, analysis and as back-end for general end-user storage (eg home directories/CERNBOX). The collected disk failure metrics over a period of 1 year from a deployment size of some 70k disks allows a first systematic analysis of the behaviour of different hard disk types for the large CERN usecases. In this contribution we describe the data collection and analysis, summarise the measured rates and compare them with other large disk deployments. We further describe initial steps to use the collected failure and SMART metrics to develop a machine learning model predicting imminent failures and hence avoid service degradation and repair costs.https://www.epj-conferences.org/articles/epjconf/pdf/2019/19/epjconf_chep2018_04046.pdf |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Duellmann Dirk Portabales Alfonso |
spellingShingle |
Duellmann Dirk Portabales Alfonso Disk failures in the EOS setup at CERN EPJ Web of Conferences |
author_facet |
Duellmann Dirk Portabales Alfonso |
author_sort |
Duellmann Dirk |
title |
Disk failures in the EOS setup at CERN |
title_short |
Disk failures in the EOS setup at CERN |
title_full |
Disk failures in the EOS setup at CERN |
title_fullStr |
Disk failures in the EOS setup at CERN |
title_full_unstemmed |
Disk failures in the EOS setup at CERN |
title_sort |
disk failures in the eos setup at cern |
publisher |
EDP Sciences |
series |
EPJ Web of Conferences |
issn |
2100-014X |
publishDate |
2019-01-01 |
description |
The EOS deployment at CERN is a core service used for both scientific
data processing, analysis and as back-end for general end-user storage (eg
home directories/CERNBOX). The collected disk failure metrics over a period
of 1 year from a deployment size of some 70k disks allows a first systematic
analysis of the behaviour of different hard disk types for the large CERN usecases.
In this contribution we describe the data collection and analysis, summarise
the measured rates and compare them with other large disk deployments. We
further describe initial steps to use the collected failure and SMART metrics to
develop a machine learning model predicting imminent failures and hence avoid
service degradation and repair costs. |
url |
https://www.epj-conferences.org/articles/epjconf/pdf/2019/19/epjconf_chep2018_04046.pdf |
work_keys_str_mv |
AT duellmanndirk diskfailuresintheeossetupatcern AT portabalesalfonso diskfailuresintheeossetupatcern |
_version_ |
1721234250821271552 |