Autonomic Failure Identification and Diagnosis for Building Dependable Cloud Computing Systems

The increasingly popular cloud-computing paradigm provides on-demand access to computing and storage with the appearance of unlimited resources. Users are given access to a variety of data and software utilities to manage their work. Users rent virtual resources and pay for only what they use. In sp...

Full description

Bibliographic Details
Main Author: Guan, Qiang
Other Authors: Fu, Song
Format: Others
Language:English
Published: University of North Texas 2014
Subjects:
Online Access:https://digital.library.unt.edu/ark:/67531/metadc499993/
id ndltd-unt.edu-info-ark-67531-metadc499993
record_format oai_dc
spelling ndltd-unt.edu-info-ark-67531-metadc4999932017-03-17T08:41:08Z Autonomic Failure Identification and Diagnosis for Building Dependable Cloud Computing Systems Guan, Qiang Cloud computing failure identification failure diagnosis dependability Cloud computing. Intelligent agents (Computer software) The increasingly popular cloud-computing paradigm provides on-demand access to computing and storage with the appearance of unlimited resources. Users are given access to a variety of data and software utilities to manage their work. Users rent virtual resources and pay for only what they use. In spite of the many benefits that cloud computing promises, the lack of dependability in shared virtualized infrastructures is a major obstacle for its wider adoption, especially for mission-critical applications. Virtualization and multi-tenancy increase system complexity and dynamicity. They introduce new sources of failure degrading the dependability of cloud computing systems. To assure cloud dependability, in my dissertation research, I develop autonomic failure identification and diagnosis techniques that are crucial for understanding emergent, cloud-wide phenomena and self-managing resource burdens for cloud availability and productivity enhancement. We study the runtime cloud performance data collected from a cloud test-bed and by using traces from production cloud systems. We define cloud signatures including those metrics that are most relevant to failure instances. We exploit profiled cloud performance data in both time and frequency domain to identify anomalous cloud behaviors and leverage cloud metric subspace analysis to automate the diagnosis of observed failures. We implement a prototype of the anomaly identification system and conduct the experiments in an on-campus cloud computing test-bed and by using the Google datacenter traces. Our experimental results show that our proposed anomaly detection mechanism can achieve 93% detection sensitivity while keeping the false positive rate as low as 6.1% and outperform other tested anomaly detection schemes. In addition, the anomaly detector adapts itself by recursively learning from these newly verified detection results to refine future detection. University of North Texas Fu, Song Huang, Yan Kavi, Krishna M. Yuan, Xiaohui 2014-05 Thesis or Dissertation xiii, 121 pages : illustrations (chiefly color) Text https://digital.library.unt.edu/ark:/67531/metadc499993/ ark: ark:/67531/metadc499993 English Public Guan, Qiang Copyright Copyright is held by the author, unless otherwise noted. All rights Reserved.
collection NDLTD
language English
format Others
sources NDLTD
topic Cloud computing
failure identification
failure diagnosis
dependability
Cloud computing.
Intelligent agents (Computer software)
spellingShingle Cloud computing
failure identification
failure diagnosis
dependability
Cloud computing.
Intelligent agents (Computer software)
Guan, Qiang
Autonomic Failure Identification and Diagnosis for Building Dependable Cloud Computing Systems
description The increasingly popular cloud-computing paradigm provides on-demand access to computing and storage with the appearance of unlimited resources. Users are given access to a variety of data and software utilities to manage their work. Users rent virtual resources and pay for only what they use. In spite of the many benefits that cloud computing promises, the lack of dependability in shared virtualized infrastructures is a major obstacle for its wider adoption, especially for mission-critical applications. Virtualization and multi-tenancy increase system complexity and dynamicity. They introduce new sources of failure degrading the dependability of cloud computing systems. To assure cloud dependability, in my dissertation research, I develop autonomic failure identification and diagnosis techniques that are crucial for understanding emergent, cloud-wide phenomena and self-managing resource burdens for cloud availability and productivity enhancement. We study the runtime cloud performance data collected from a cloud test-bed and by using traces from production cloud systems. We define cloud signatures including those metrics that are most relevant to failure instances. We exploit profiled cloud performance data in both time and frequency domain to identify anomalous cloud behaviors and leverage cloud metric subspace analysis to automate the diagnosis of observed failures. We implement a prototype of the anomaly identification system and conduct the experiments in an on-campus cloud computing test-bed and by using the Google datacenter traces. Our experimental results show that our proposed anomaly detection mechanism can achieve 93% detection sensitivity while keeping the false positive rate as low as 6.1% and outperform other tested anomaly detection schemes. In addition, the anomaly detector adapts itself by recursively learning from these newly verified detection results to refine future detection.
author2 Fu, Song
author_facet Fu, Song
Guan, Qiang
author Guan, Qiang
author_sort Guan, Qiang
title Autonomic Failure Identification and Diagnosis for Building Dependable Cloud Computing Systems
title_short Autonomic Failure Identification and Diagnosis for Building Dependable Cloud Computing Systems
title_full Autonomic Failure Identification and Diagnosis for Building Dependable Cloud Computing Systems
title_fullStr Autonomic Failure Identification and Diagnosis for Building Dependable Cloud Computing Systems
title_full_unstemmed Autonomic Failure Identification and Diagnosis for Building Dependable Cloud Computing Systems
title_sort autonomic failure identification and diagnosis for building dependable cloud computing systems
publisher University of North Texas
publishDate 2014
url https://digital.library.unt.edu/ark:/67531/metadc499993/
work_keys_str_mv AT guanqiang autonomicfailureidentificationanddiagnosisforbuildingdependablecloudcomputingsystems
_version_ 1718432218510000128