Scalable and robust compute capacity multiplexing in virtualized datacenters

Multi-tenant cloud computing datacenters run diverse workloads, inside virtual machines (VMs), with time varying resource demands. Compute capacity multiplexing systems dynamically manage the placement of VMs on physical machines to ensure that their resource demands are always met while simultaneou...

Full description

Bibliographic Details
Main Author: Kesavan, Mukil
Other Authors: Schwan, Karsten
Format: Others
Language:en_US
Published: Georgia Institute of Technology 2014
Subjects:
Online Access:http://hdl.handle.net/1853/52229
id ndltd-GATECH-oai-smartech.gatech.edu-1853-52229
record_format oai_dc
spelling ndltd-GATECH-oai-smartech.gatech.edu-1853-522292015-04-04T03:37:07ZScalable and robust compute capacity multiplexing in virtualized datacentersKesavan, MukilDistributed systemsVirtualizationResource managementFault toleranceFunction replicationBenchmarkingMulti-tenant cloud computing datacenters run diverse workloads, inside virtual machines (VMs), with time varying resource demands. Compute capacity multiplexing systems dynamically manage the placement of VMs on physical machines to ensure that their resource demands are always met while simultaneously optimizing on the total datacenter compute capacity being used. In essence, they give the cloud its fundamental property of being able to dynamically expand and contract resources required on-demand. At large scale datacenters though there are two practical realities that designers of compute capacity multiplexing systems need to deal with: (a) maintaining low operational overhead given variable cost of performing management operations necessary to allocate and multiplex resources, and (b) the prevalence of a large number and wide variety of faults in hardware, software and due to human error, that impair multiplexing efficiency. In this thesis we propound the notion that explicitly designing the methods and abstractions used in capacity multiplexing systems for this reality is critical to better achieve administrator and customer goals at large scales. To this end the thesis makes the following contributions: (i) CCM - a hierarchically organized compute capacity multiplexer that demonstrates that simple designs can be highly effective at multiplexing capacity with low overheads at large scales compared to complex alternatives, (ii) Xerxes - a distributed load generation framework for flexibly and reliably benchmarking compute capacity allocation and multiplexing systems, (iii) A speculative virtualized infrastructure management stack that dynamically replicates management operations on virtualized entities, and a compute capacity multiplexer for this environment, that together provide fault-scalable management performance for a broad class of commonly occurring faults in large scale datacenters. Our systems have been implemented in an industry-strength cloud infrastructure built on top of the VMware vSphere virtualization platform and the popular open source OpenStack cloud computing platform running ESXi and Xen hypervisors, respectively. Our experiments have been conducted in a 700 server datacenter using the Xerxes benchmark replaying trace data from production clusters, simulating parameterized scenarios like flash crowds, and also using a suite of representative cloud applications. Results from these scenarios demonstrate the effectiveness of our design techniques in real-life large scale environments.Georgia Institute of TechnologySchwan, Karsten2014-08-27T13:37:03Z2014-08-27T13:37:03Z2014-082014-05-16August 20142014-08-27T13:37:03ZDissertationapplication/pdfhttp://hdl.handle.net/1853/52229en_US
collection NDLTD
language en_US
format Others
sources NDLTD
topic Distributed systems
Virtualization
Resource management
Fault tolerance
Function replication
Benchmarking

spellingShingle Distributed systems
Virtualization
Resource management
Fault tolerance
Function replication
Benchmarking

Kesavan, Mukil
Scalable and robust compute capacity multiplexing in virtualized datacenters
description Multi-tenant cloud computing datacenters run diverse workloads, inside virtual machines (VMs), with time varying resource demands. Compute capacity multiplexing systems dynamically manage the placement of VMs on physical machines to ensure that their resource demands are always met while simultaneously optimizing on the total datacenter compute capacity being used. In essence, they give the cloud its fundamental property of being able to dynamically expand and contract resources required on-demand. At large scale datacenters though there are two practical realities that designers of compute capacity multiplexing systems need to deal with: (a) maintaining low operational overhead given variable cost of performing management operations necessary to allocate and multiplex resources, and (b) the prevalence of a large number and wide variety of faults in hardware, software and due to human error, that impair multiplexing efficiency. In this thesis we propound the notion that explicitly designing the methods and abstractions used in capacity multiplexing systems for this reality is critical to better achieve administrator and customer goals at large scales. To this end the thesis makes the following contributions: (i) CCM - a hierarchically organized compute capacity multiplexer that demonstrates that simple designs can be highly effective at multiplexing capacity with low overheads at large scales compared to complex alternatives, (ii) Xerxes - a distributed load generation framework for flexibly and reliably benchmarking compute capacity allocation and multiplexing systems, (iii) A speculative virtualized infrastructure management stack that dynamically replicates management operations on virtualized entities, and a compute capacity multiplexer for this environment, that together provide fault-scalable management performance for a broad class of commonly occurring faults in large scale datacenters. Our systems have been implemented in an industry-strength cloud infrastructure built on top of the VMware vSphere virtualization platform and the popular open source OpenStack cloud computing platform running ESXi and Xen hypervisors, respectively. Our experiments have been conducted in a 700 server datacenter using the Xerxes benchmark replaying trace data from production clusters, simulating parameterized scenarios like flash crowds, and also using a suite of representative cloud applications. Results from these scenarios demonstrate the effectiveness of our design techniques in real-life large scale environments.
author2 Schwan, Karsten
author_facet Schwan, Karsten
Kesavan, Mukil
author Kesavan, Mukil
author_sort Kesavan, Mukil
title Scalable and robust compute capacity multiplexing in virtualized datacenters
title_short Scalable and robust compute capacity multiplexing in virtualized datacenters
title_full Scalable and robust compute capacity multiplexing in virtualized datacenters
title_fullStr Scalable and robust compute capacity multiplexing in virtualized datacenters
title_full_unstemmed Scalable and robust compute capacity multiplexing in virtualized datacenters
title_sort scalable and robust compute capacity multiplexing in virtualized datacenters
publisher Georgia Institute of Technology
publishDate 2014
url http://hdl.handle.net/1853/52229
work_keys_str_mv AT kesavanmukil scalableandrobustcomputecapacitymultiplexinginvirtualizeddatacenters
_version_ 1716800195908337664