Summary: | Emerging trends in Cloud computing bring numerous benefits, such as higher performance, fast and flexible provisioning of applications and capacities, lower infrastructure costs, and
almost unlimited scalability. However, the increasing complexity of automated performance and resource management
for applications in Cloud computing presents novel challenges that demand enhancement to classical control-based approaches.
An important challenge that Cloud service providers often face is a resource sharing dilemma under
workload variation. Cloud service providers pursue higher resource utilization, because the higher the utilization, the lower the hardware cost, operating cost and maintenance cost.
On the other hand, resource utilizations cannot be too high or the service provider's revenue could be jeopardized due to the inability to meet application-level service-level objectives (SLOs).
A crucial research question is how to generate as much revenue as possible by satisfying service-level agreements
while reducing costs as much as possible in order to maximize the profit for Cloud service providers.
To this end, the classical control-based approaches show great potential to address the resource sharing dilemma, which could be classified into three major categories, i.e., admission control, queueing and scheduling, and resource allocation. However, it is a challenging task to apply classical control-based approaches directly to computer systems, where first-principle models are generally not available. It becomes even more difficult due to the dynamics seen in real computer systems including workload variations, multi-tier dependencies, and resource bottleneck shifts.
Fundamentally, the main contributions of this thesis are the efforts
to enhance classical control-based approaches by leveraging other techniques
to address the increasing complexity of automated performance and resource management in the Cloud
through dynamic monitoring, modeling and management of performance and resources.
More specifically, (1) an admission control approach
is enhanced by leveraging decision theory to achieve the most profitable service-level compliance;
(2) a critical resource identification approach
is enhanced by leveraging statistical machine learning to automatically and adaptively identify critical resources;
and (3) a resource allocation approach
is enhanced by leveraging hierarchical resource management to achieve the highest resource utilization.
Concretely, the enhanced control-based approaches are implemented in
a collection of real control systems: ActiveSLA, vPerfGuard and ERController.
The control systems are applied to different real applications, such as OLTP and OLAP database applications and distributed multi-tier web applications, with different workload intensities, type and mix, in different Cloud environments.
All the experimental results show that the prototype control systems outperform existing classical control-based approaches.
Finally, this thesis opens new avenues to address the increasing complexity of automated performance and resource management
through enhancement of classical control-based approaches in Cloud environments. Future work
will consistently follow the direction of new avenues to address the new challenges that arise with the advent of new hardware technology, new software frameworks and new computing paradigms.
|