Enabling scalable self-management for enterprise-scale systems

Implementing self-management for enterprise systems is difficult. First, the scale and complexity of such systems makes it hard to understand and interpret system behavior or worse, the root causes of certain behaviors. Second, it is not clear how the goals specified at a system-level translate to c...

Full description

Bibliographic Details
Main Author: Kumar, Vibhore
Published: Georgia Institute of Technology 2008
Subjects:
Online Access:http://hdl.handle.net/1853/24788
Description
Summary:Implementing self-management for enterprise systems is difficult. First, the scale and complexity of such systems makes it hard to understand and interpret system behavior or worse, the root causes of certain behaviors. Second, it is not clear how the goals specified at a system-level translate to component-level actions that drive the system. Third, the dynamic environments in which such systems operate requires self-management techniques that not only adapt the system but also adapt their own decision making processes. Finally, to build a self-management solution that is acceptable to administrators, it should have the properties of tractability and trust, which allow an administrator to both understand and fine-tune self-management actions. This dissertation work introduces, implements, and evaluates iManage, a novel system state-space based framework for enabling self-management of enterprise-scale systems. The system state-space, in iManage, is defined to be a collection of monitored system parameters and metrics (termed system variables). In addition, from amongst the system variables, it identifies the variables of interest, which determine the operational status of a system, and the controllable variables, which are the ones that can be deterministically modified to affect the operational status of a system. Using this formal representation, we have developed and integrated into iManage techniques that establish a probabilistic model relating the variables of interest and the controllable variables under the prevailing operational conditions. Such models are then used by iManage to determine corrective actions in case of SLA violations and/or to determine per-component ranges for controllable variables, which if independently adhered to by each component, lead to SLA compliance. To address the issue of scale in determining system models, iManage makes use of a novel state-space partitioning scheme that partitions the state-space into smaller sub-spaces thereby allowing us to more precisely model the critical system aspects. Our chosen modeling techniques are such that the generated models can be easily understood and modified by the administrator. Furthermore, iManage associates each proposed self-management action with a confidence-attribute that determines whether the action in question merits autonomic enforcement or not.