Scalable and robust compute capacity multiplexing in virtualized datacenters

Multi-tenant cloud computing datacenters run diverse workloads, inside virtual machines (VMs), with time varying resource demands. Compute capacity multiplexing systems dynamically manage the placement of VMs on physical machines to ensure that their resource demands are always met while simultaneou...

Full description

Bibliographic Details
Main Author:	Kesavan, Mukil
Other Authors:	Schwan, Karsten
Format:	Others
Language:	en_US
Published:	Georgia Institute of Technology 2014
Subjects:	Distributed systems Virtualization Resource management Fault tolerance Function replication Benchmarking
Online Access:	http://hdl.handle.net/1853/52229

id	ndltd-GATECH-oai-smartech.gatech.edu-1853-52229
record_format	oai_dc
spelling	ndltd-GATECH-oai-smartech.gatech.edu-1853-522292015-04-04T03:37:07ZScalable and robust compute capacity multiplexing in virtualized datacentersKesavan, MukilDistributed systemsVirtualizationResource managementFault toleranceFunction replicationBenchmarkingMulti-tenant cloud computing datacenters run diverse workloads, inside virtual machines (VMs), with time varying resource demands. Compute capacity multiplexing systems dynamically manage the placement of VMs on physical machines to ensure that their resource demands are always met while simultaneously optimizing on the total datacenter compute capacity being used. In essence, they give the cloud its fundamental property of being able to dynamically expand and contract resources required on-demand. At large scale datacenters though there are two practical realities that designers of compute capacity multiplexing systems need to deal with: (a) maintaining low operational overhead given variable cost of performing management operations necessary to allocate and multiplex resources, and (b) the prevalence of a large number and wide variety of faults in hardware, software and due to human error, that impair multiplexing efficiency. In this thesis we propound the notion that explicitly designing the methods and abstractions used in capacity multiplexing systems for this reality is critical to better achieve administrator and customer goals at large scales. To this end the thesis makes the following contributions: (i) CCM - a hierarchically organized compute capacity multiplexer that demonstrates that simple designs can be highly effective at multiplexing capacity with low overheads at large scales compared to complex alternatives, (ii) Xerxes - a distributed load generation framework for flexibly and reliably benchmarking compute capacity allocation and multiplexing systems, (iii) A speculative virtualized infrastructure management stack that dynamically replicates management operations on virtualized entities, and a compute capacity multiplexer for this environment, that together provide fault-scalable management performance for a broad class of commonly occurring faults in large scale datacenters. Our systems have been implemented in an industry-strength cloud infrastructure built on top of the VMware vSphere virtualization platform and the popular open source OpenStack cloud computing platform running ESXi and Xen hypervisors, respectively. Our experiments have been conducted in a 700 server datacenter using the Xerxes benchmark replaying trace data from production clusters, simulating parameterized scenarios like flash crowds, and also using a suite of representative cloud applications. Results from these scenarios demonstrate the effectiveness of our design techniques in real-life large scale environments.Georgia Institute of TechnologySchwan, Karsten2014-08-27T13:37:03Z2014-08-27T13:37:03Z2014-082014-05-16August 20142014-08-27T13:37:03ZDissertationapplication/pdfhttp://hdl.handle.net/1853/52229en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
topic	Distributed systems Virtualization Resource management Fault tolerance Function replication Benchmarking
spellingShingle	Distributed systems Virtualization Resource management Fault tolerance Function replication Benchmarking Kesavan, Mukil Scalable and robust compute capacity multiplexing in virtualized datacenters
description	Multi-tenant cloud computing datacenters run diverse workloads, inside virtual machines (VMs), with time varying resource demands. Compute capacity multiplexing systems dynamically manage the placement of VMs on physical machines to ensure that their resource demands are always met while simultaneously optimizing on the total datacenter compute capacity being used. In essence, they give the cloud its fundamental property of being able to dynamically expand and contract resources required on-demand. At large scale datacenters though there are two practical realities that designers of compute capacity multiplexing systems need to deal with: (a) maintaining low operational overhead given variable cost of performing management operations necessary to allocate and multiplex resources, and (b) the prevalence of a large number and wide variety of faults in hardware, software and due to human error, that impair multiplexing efficiency. In this thesis we propound the notion that explicitly designing the methods and abstractions used in capacity multiplexing systems for this reality is critical to better achieve administrator and customer goals at large scales. To this end the thesis makes the following contributions: (i) CCM - a hierarchically organized compute capacity multiplexer that demonstrates that simple designs can be highly effective at multiplexing capacity with low overheads at large scales compared to complex alternatives, (ii) Xerxes - a distributed load generation framework for flexibly and reliably benchmarking compute capacity allocation and multiplexing systems, (iii) A speculative virtualized infrastructure management stack that dynamically replicates management operations on virtualized entities, and a compute capacity multiplexer for this environment, that together provide fault-scalable management performance for a broad class of commonly occurring faults in large scale datacenters. Our systems have been implemented in an industry-strength cloud infrastructure built on top of the VMware vSphere virtualization platform and the popular open source OpenStack cloud computing platform running ESXi and Xen hypervisors, respectively. Our experiments have been conducted in a 700 server datacenter using the Xerxes benchmark replaying trace data from production clusters, simulating parameterized scenarios like flash crowds, and also using a suite of representative cloud applications. Results from these scenarios demonstrate the effectiveness of our design techniques in real-life large scale environments.
author2	Schwan, Karsten
author_facet	Schwan, Karsten Kesavan, Mukil
author	Kesavan, Mukil
author_sort	Kesavan, Mukil
title	Scalable and robust compute capacity multiplexing in virtualized datacenters
title_short	Scalable and robust compute capacity multiplexing in virtualized datacenters
title_full	Scalable and robust compute capacity multiplexing in virtualized datacenters
title_fullStr	Scalable and robust compute capacity multiplexing in virtualized datacenters
title_full_unstemmed	Scalable and robust compute capacity multiplexing in virtualized datacenters
title_sort	scalable and robust compute capacity multiplexing in virtualized datacenters
publisher	Georgia Institute of Technology
publishDate	2014
url	http://hdl.handle.net/1853/52229
work_keys_str_mv	AT kesavanmukil scalableandrobustcomputecapacitymultiplexinginvirtualizeddatacenters
_version_	1716800195908337664

Scalable and robust compute capacity multiplexing in virtualized datacenters

Similar Items