Incremental Prediction Model of Disk Failures Based on the Density Metric of Edge Samples

Disks are the main equipment for data storage in data centers. The prediction of disk failure is of great significance for the reliability and security of data. On account of the few abnormal samples in the disk datasets, it is difficult to satisfy the requirement of supervised and semi-supervised a...

Full description

Bibliographic Details
Main Authors:	Xin Gao, Sen Zha, Xinpeng Li, Bo Yan, Xiao Jing, Junliang Li, Jianhang Xu
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	Disk failures prediction nearest neighbor edge density metric incremental learning
Online Access:	https://ieeexplore.ieee.org/document/8801827/

id	doaj-d9c7b9d9b2414a9da50b1d65e3c1ff1a
record_format	Article
spelling	doaj-d9c7b9d9b2414a9da50b1d65e3c1ff1a2021-04-05T17:28:44ZengIEEEIEEE Access2169-35362019-01-01711428511429610.1109/ACCESS.2019.29356288801827Incremental Prediction Model of Disk Failures Based on the Density Metric of Edge SamplesXin Gao0https://orcid.org/0000-0002-7760-0915Sen Zha1Xinpeng Li2Bo Yan3Xiao Jing4Junliang Li5Jianhang Xu6School of Automation, Beijing University of Posts and Telecommunications, Beijing, ChinaSchool of Automation, Beijing University of Posts and Telecommunications, Beijing, ChinaSchool of Automation, Beijing University of Posts and Telecommunications, Beijing, ChinaState Grid Jibei Electric Power Company Limited, Beijing, ChinaSchool of Automation, Beijing University of Posts and Telecommunications, Beijing, ChinaNari Group Corporation (State Grid Electric Power Research Institute), Beijing, ChinaNari Group Corporation (State Grid Electric Power Research Institute), Beijing, ChinaDisks are the main equipment for data storage in data centers. The prediction of disk failure is of great significance for the reliability and security of data. On account of the few abnormal samples in the disk datasets, it is difficult to satisfy the requirement of supervised and semi-supervised algorithms for the number of abnormal data while the unsupervised algorithms have poor performance on recall rate when solving the problems of local anomalies and wrapped a nomalies. This paper presents an incremental learning disk failure prediction model using the density metric of edge samples. An isolation region is built by searching the nearest neighbor of each sample. We calculate the nearest training point of the test point which is not a global anomaly and the nearest training point of the obtained nearest training point by Euclidean distance. The global metric of abnormal degree of the test sample comes from the ratio of the radius of the region where the two nearest training points are located. Then, the local metric of abnormal degree of the test sample comes from the ratio between the nearest distance from the test point to the edge of the training point region and the radius of the region. Abnormal scores of test points can be obtained by combining two measurements. We identify the SMART attributes that are significantly related to disk failures and promote their weights in the next time the attributes are inputted. The experiments are carried on the synthetic and public datasets which contain local anomalies and wrapped anomalies. The proposed method outperforms the typical unsupervised algorithms such as iNNE, iForest and LOF, and the achieved recall rates increase at most 7%. Furthermore, the contrast tests on the public disk datasets also verify the proposed method has better performance on recall rate.https://ieeexplore.ieee.org/document/8801827/Disk failures predictionnearest neighboredge density metricincremental learning
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Xin Gao Sen Zha Xinpeng Li Bo Yan Xiao Jing Junliang Li Jianhang Xu
spellingShingle	Xin Gao Sen Zha Xinpeng Li Bo Yan Xiao Jing Junliang Li Jianhang Xu Incremental Prediction Model of Disk Failures Based on the Density Metric of Edge Samples IEEE Access Disk failures prediction nearest neighbor edge density metric incremental learning
author_facet	Xin Gao Sen Zha Xinpeng Li Bo Yan Xiao Jing Junliang Li Jianhang Xu
author_sort	Xin Gao
title	Incremental Prediction Model of Disk Failures Based on the Density Metric of Edge Samples
title_short	Incremental Prediction Model of Disk Failures Based on the Density Metric of Edge Samples
title_full	Incremental Prediction Model of Disk Failures Based on the Density Metric of Edge Samples
title_fullStr	Incremental Prediction Model of Disk Failures Based on the Density Metric of Edge Samples
title_full_unstemmed	Incremental Prediction Model of Disk Failures Based on the Density Metric of Edge Samples
title_sort	incremental prediction model of disk failures based on the density metric of edge samples
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2019-01-01
description	Disks are the main equipment for data storage in data centers. The prediction of disk failure is of great significance for the reliability and security of data. On account of the few abnormal samples in the disk datasets, it is difficult to satisfy the requirement of supervised and semi-supervised algorithms for the number of abnormal data while the unsupervised algorithms have poor performance on recall rate when solving the problems of local anomalies and wrapped a nomalies. This paper presents an incremental learning disk failure prediction model using the density metric of edge samples. An isolation region is built by searching the nearest neighbor of each sample. We calculate the nearest training point of the test point which is not a global anomaly and the nearest training point of the obtained nearest training point by Euclidean distance. The global metric of abnormal degree of the test sample comes from the ratio of the radius of the region where the two nearest training points are located. Then, the local metric of abnormal degree of the test sample comes from the ratio between the nearest distance from the test point to the edge of the training point region and the radius of the region. Abnormal scores of test points can be obtained by combining two measurements. We identify the SMART attributes that are significantly related to disk failures and promote their weights in the next time the attributes are inputted. The experiments are carried on the synthetic and public datasets which contain local anomalies and wrapped anomalies. The proposed method outperforms the typical unsupervised algorithms such as iNNE, iForest and LOF, and the achieved recall rates increase at most 7%. Furthermore, the contrast tests on the public disk datasets also verify the proposed method has better performance on recall rate.
topic	Disk failures prediction nearest neighbor edge density metric incremental learning
url	https://ieeexplore.ieee.org/document/8801827/
work_keys_str_mv	AT xingao incrementalpredictionmodelofdiskfailuresbasedonthedensitymetricofedgesamples AT senzha incrementalpredictionmodelofdiskfailuresbasedonthedensitymetricofedgesamples AT xinpengli incrementalpredictionmodelofdiskfailuresbasedonthedensitymetricofedgesamples AT boyan incrementalpredictionmodelofdiskfailuresbasedonthedensitymetricofedgesamples AT xiaojing incrementalpredictionmodelofdiskfailuresbasedonthedensitymetricofedgesamples AT junliangli incrementalpredictionmodelofdiskfailuresbasedonthedensitymetricofedgesamples AT jianhangxu incrementalpredictionmodelofdiskfailuresbasedonthedensitymetricofedgesamples
_version_	1724164119406313472

Incremental Prediction Model of Disk Failures Based on the Density Metric of Edge Samples

Similar Items