DC-Prophet: Predicting Catastrophic Machine Failures in DataCenters

碩士 === 國立清華大學 === 資訊工程學系所 === 106 === When will a server fail catastrophically in an industrial datacenter? Is it possible to forecast these failures so preventive actions can be taken to increase the reliability of a datacenter? To answer these questions, we have studied what are probably the large...

Full description

Bibliographic Details
Main Authors: Lee, You-Luen, 李侑倫
Other Authors: Chang, Shih-Chieh
Format: Others
Language:en_US
Published: 2017
Online Access:http://ndltd.ncl.edu.tw/handle/5qvsd3
id ndltd-TW-106NTHU5392008
record_format oai_dc
spelling ndltd-TW-106NTHU53920082019-05-16T00:00:23Z http://ndltd.ncl.edu.tw/handle/5qvsd3 DC-Prophet: Predicting Catastrophic Machine Failures in DataCenters 資料中心先知:精準預測災難性伺服器故障意外事件 Lee, You-Luen 李侑倫 碩士 國立清華大學 資訊工程學系所 106 When will a server fail catastrophically in an industrial datacenter? Is it possible to forecast these failures so preventive actions can be taken to increase the reliability of a datacenter? To answer these questions, we have studied what are probably the largest, publicly available datacenter traces, containing more than 104 million events from 12,500 machines. Among these samples, we observe and categorize three types of machine failures, all of which are catastrophic and may lead to information loss, or even worse, reliability degradation of a datacenter. We further propose a two-stage framework—DC-Prophet—based on One-Class Support Vector Machine and Random Forest. DC-Prophet extracts surprising patterns and accurately predicts the next failure of a machine. Experimental results show that DC-Prophet achieves an AUC of 0.93 in predicting the next machine failure, and a F3-score of 0.88 (out of 1). On average, DC-prophet outperforms other classical machine learning methods by 39.45% in F3-score. Chang, Shih-Chieh 張世杰 2017 學位論文 ; thesis 31 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立清華大學 === 資訊工程學系所 === 106 === When will a server fail catastrophically in an industrial datacenter? Is it possible to forecast these failures so preventive actions can be taken to increase the reliability of a datacenter? To answer these questions, we have studied what are probably the largest, publicly available datacenter traces, containing more than 104 million events from 12,500 machines. Among these samples, we observe and categorize three types of machine failures, all of which are catastrophic and may lead to information loss, or even worse, reliability degradation of a datacenter. We further propose a two-stage framework—DC-Prophet—based on One-Class Support Vector Machine and Random Forest. DC-Prophet extracts surprising patterns and accurately predicts the next failure of a machine. Experimental results show that DC-Prophet achieves an AUC of 0.93 in predicting the next machine failure, and a F3-score of 0.88 (out of 1). On average, DC-prophet outperforms other classical machine learning methods by 39.45% in F3-score.
author2 Chang, Shih-Chieh
author_facet Chang, Shih-Chieh
Lee, You-Luen
李侑倫
author Lee, You-Luen
李侑倫
spellingShingle Lee, You-Luen
李侑倫
DC-Prophet: Predicting Catastrophic Machine Failures in DataCenters
author_sort Lee, You-Luen
title DC-Prophet: Predicting Catastrophic Machine Failures in DataCenters
title_short DC-Prophet: Predicting Catastrophic Machine Failures in DataCenters
title_full DC-Prophet: Predicting Catastrophic Machine Failures in DataCenters
title_fullStr DC-Prophet: Predicting Catastrophic Machine Failures in DataCenters
title_full_unstemmed DC-Prophet: Predicting Catastrophic Machine Failures in DataCenters
title_sort dc-prophet: predicting catastrophic machine failures in datacenters
publishDate 2017
url http://ndltd.ncl.edu.tw/handle/5qvsd3
work_keys_str_mv AT leeyouluen dcprophetpredictingcatastrophicmachinefailuresindatacenters
AT lǐyòulún dcprophetpredictingcatastrophicmachinefailuresindatacenters
AT leeyouluen zīliàozhōngxīnxiānzhījīngzhǔnyùcèzāinánxìngcìfúqìgùzhàngyìwàishìjiàn
AT lǐyòulún zīliàozhōngxīnxiānzhījīngzhǔnyùcèzāinánxìngcìfúqìgùzhàngyìwàishìjiàn
_version_ 1719158167339794432