Autonomic Failure Identification and Diagnosis for Building Dependable Cloud Computing Systems
The increasingly popular cloud-computing paradigm provides on-demand access to computing and storage with the appearance of unlimited resources. Users are given access to a variety of data and software utilities to manage their work. Users rent virtual resources and pay for only what they use. In sp...
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Language: | English |
Published: |
University of North Texas
2014
|
Subjects: | |
Online Access: | https://digital.library.unt.edu/ark:/67531/metadc499993/ |
id |
ndltd-unt.edu-info-ark-67531-metadc499993 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-unt.edu-info-ark-67531-metadc4999932017-03-17T08:41:08Z Autonomic Failure Identification and Diagnosis for Building Dependable Cloud Computing Systems Guan, Qiang Cloud computing failure identification failure diagnosis dependability Cloud computing. Intelligent agents (Computer software) The increasingly popular cloud-computing paradigm provides on-demand access to computing and storage with the appearance of unlimited resources. Users are given access to a variety of data and software utilities to manage their work. Users rent virtual resources and pay for only what they use. In spite of the many benefits that cloud computing promises, the lack of dependability in shared virtualized infrastructures is a major obstacle for its wider adoption, especially for mission-critical applications. Virtualization and multi-tenancy increase system complexity and dynamicity. They introduce new sources of failure degrading the dependability of cloud computing systems. To assure cloud dependability, in my dissertation research, I develop autonomic failure identification and diagnosis techniques that are crucial for understanding emergent, cloud-wide phenomena and self-managing resource burdens for cloud availability and productivity enhancement. We study the runtime cloud performance data collected from a cloud test-bed and by using traces from production cloud systems. We define cloud signatures including those metrics that are most relevant to failure instances. We exploit profiled cloud performance data in both time and frequency domain to identify anomalous cloud behaviors and leverage cloud metric subspace analysis to automate the diagnosis of observed failures. We implement a prototype of the anomaly identification system and conduct the experiments in an on-campus cloud computing test-bed and by using the Google datacenter traces. Our experimental results show that our proposed anomaly detection mechanism can achieve 93% detection sensitivity while keeping the false positive rate as low as 6.1% and outperform other tested anomaly detection schemes. In addition, the anomaly detector adapts itself by recursively learning from these newly verified detection results to refine future detection. University of North Texas Fu, Song Huang, Yan Kavi, Krishna M. Yuan, Xiaohui 2014-05 Thesis or Dissertation xiii, 121 pages : illustrations (chiefly color) Text https://digital.library.unt.edu/ark:/67531/metadc499993/ ark: ark:/67531/metadc499993 English Public Guan, Qiang Copyright Copyright is held by the author, unless otherwise noted. All rights Reserved. |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
Cloud computing failure identification failure diagnosis dependability Cloud computing. Intelligent agents (Computer software) |
spellingShingle |
Cloud computing failure identification failure diagnosis dependability Cloud computing. Intelligent agents (Computer software) Guan, Qiang Autonomic Failure Identification and Diagnosis for Building Dependable Cloud Computing Systems |
description |
The increasingly popular cloud-computing paradigm provides on-demand access to computing and storage with the appearance of unlimited resources. Users are given access to a variety of data and software utilities to manage their work. Users rent virtual resources and pay for only what they use. In spite of the many benefits that cloud computing promises, the lack of dependability in shared virtualized infrastructures is a major obstacle for its wider adoption, especially for mission-critical applications. Virtualization and multi-tenancy increase system complexity and dynamicity. They introduce new sources of failure degrading the dependability of cloud computing systems. To assure cloud dependability, in my dissertation research, I develop autonomic failure identification and diagnosis techniques that are crucial for understanding emergent, cloud-wide phenomena and self-managing resource burdens for cloud availability and productivity enhancement. We study the runtime cloud performance data collected from a cloud test-bed and by using traces from production cloud systems. We define cloud signatures including those metrics that are most relevant to failure instances. We exploit profiled cloud performance data in both time and frequency domain to identify anomalous cloud behaviors and leverage cloud metric subspace analysis to automate the diagnosis of observed failures. We implement a prototype of the anomaly identification system and conduct the experiments in an on-campus cloud computing test-bed and by using the Google datacenter traces. Our experimental results show that our proposed anomaly detection mechanism can achieve 93% detection sensitivity while keeping the false positive rate as low as 6.1% and outperform other tested anomaly detection schemes. In addition, the anomaly detector adapts itself by recursively learning from these newly verified detection results to refine future detection. |
author2 |
Fu, Song |
author_facet |
Fu, Song Guan, Qiang |
author |
Guan, Qiang |
author_sort |
Guan, Qiang |
title |
Autonomic Failure Identification and Diagnosis for Building Dependable Cloud Computing Systems |
title_short |
Autonomic Failure Identification and Diagnosis for Building Dependable Cloud Computing Systems |
title_full |
Autonomic Failure Identification and Diagnosis for Building Dependable Cloud Computing Systems |
title_fullStr |
Autonomic Failure Identification and Diagnosis for Building Dependable Cloud Computing Systems |
title_full_unstemmed |
Autonomic Failure Identification and Diagnosis for Building Dependable Cloud Computing Systems |
title_sort |
autonomic failure identification and diagnosis for building dependable cloud computing systems |
publisher |
University of North Texas |
publishDate |
2014 |
url |
https://digital.library.unt.edu/ark:/67531/metadc499993/ |
work_keys_str_mv |
AT guanqiang autonomicfailureidentificationanddiagnosisforbuildingdependablecloudcomputingsystems |
_version_ |
1718432218510000128 |