Support for configuration and provisioning of intermediate storage systems

This dissertation focuses on supporting the provisioning and configuration of distributed storage systems in clusters of computers that are designed to provide a high performance computing platform for batch applications. These platforms typically offer a centralized persistent backend storage syste...

Full description

Bibliographic Details
Main Author: Costa, Lauro Beltrão
Language:English
Published: University of British Columbia 2014
Online Access:http://hdl.handle.net/2429/51191
id ndltd-UBC-oai-circle.library.ubc.ca-2429-51191
record_format oai_dc
spelling ndltd-UBC-oai-circle.library.ubc.ca-2429-511912018-01-05T17:27:48Z Support for configuration and provisioning of intermediate storage systems Costa, Lauro Beltrão This dissertation focuses on supporting the provisioning and configuration of distributed storage systems in clusters of computers that are designed to provide a high performance computing platform for batch applications. These platforms typically offer a centralized persistent backend storage system. To avoid the potential bottleneck of accessing the platform's backend storage system, intermediate storage systems aggregate resources allocated to the application to provide a shared temporary storage space dedicated to the application execution. Configuring an intermediate storage system, however, becomes increasingly complex. As a distributed storage system, intermediate storage can employ a wide range of storage techniques that enable workload-dependent trade-offs over interrelated success metrics such as response time, throughput, storage space, and energy consumption. Because it is co-deployed with the application, it offers the user the opportunity to tailor its provisioning and configuration to extract the maximum performance from the infrastructure. For example, the user can optimize the performance by deciding the total number of nodes of an allocation, splitting these nodes, or not, between the application and the intermediate storage, and choosing the values for several configuration parameters for storage techniques with different trade-offs. This dissertation targets the problem of supporting the configuration and provisioning of intermediate storage systems in the context of workflow-based scientific applications that communicate via files -- also known as many-task computing -- as well as checkpointing applications. Specifically, this study proposes performance prediction mechanisms to estimate performance of overall application or storage operations (e.g., an application turn-around time, application's energy consumption, or response time of write operations). By relying on the target application's characteristics, the proposed mechanisms can accelerate the exploration of the configuration space. The mechanisms use monitoring information available at the application level, not requiring changes to the storage system nor specialized monitoring systems. The effectiveness of these mechanisms is evaluated in a number of scenarios -- including different system scale, hardware platforms, and configuration choices. Overall, the mechanisms provide accuracy high enough to support the user's decisions about configuration and provisioning the storage system, while being 200x to 2000x less resource-intensive than running the actual applications. Applied Science, Faculty of Electrical and Computer Engineering, Department of Graduate 2014-11-25T18:47:04Z 2014-11-25T18:47:04Z 2014 2015-02 Text Thesis/Dissertation http://hdl.handle.net/2429/51191 eng Attribution-NonCommercial-NoDerivs 2.5 Canada http://creativecommons.org/licenses/by-nc-nd/2.5/ca/ University of British Columbia
collection NDLTD
language English
sources NDLTD
description This dissertation focuses on supporting the provisioning and configuration of distributed storage systems in clusters of computers that are designed to provide a high performance computing platform for batch applications. These platforms typically offer a centralized persistent backend storage system. To avoid the potential bottleneck of accessing the platform's backend storage system, intermediate storage systems aggregate resources allocated to the application to provide a shared temporary storage space dedicated to the application execution. Configuring an intermediate storage system, however, becomes increasingly complex. As a distributed storage system, intermediate storage can employ a wide range of storage techniques that enable workload-dependent trade-offs over interrelated success metrics such as response time, throughput, storage space, and energy consumption. Because it is co-deployed with the application, it offers the user the opportunity to tailor its provisioning and configuration to extract the maximum performance from the infrastructure. For example, the user can optimize the performance by deciding the total number of nodes of an allocation, splitting these nodes, or not, between the application and the intermediate storage, and choosing the values for several configuration parameters for storage techniques with different trade-offs. This dissertation targets the problem of supporting the configuration and provisioning of intermediate storage systems in the context of workflow-based scientific applications that communicate via files -- also known as many-task computing -- as well as checkpointing applications. Specifically, this study proposes performance prediction mechanisms to estimate performance of overall application or storage operations (e.g., an application turn-around time, application's energy consumption, or response time of write operations). By relying on the target application's characteristics, the proposed mechanisms can accelerate the exploration of the configuration space. The mechanisms use monitoring information available at the application level, not requiring changes to the storage system nor specialized monitoring systems. The effectiveness of these mechanisms is evaluated in a number of scenarios -- including different system scale, hardware platforms, and configuration choices. Overall, the mechanisms provide accuracy high enough to support the user's decisions about configuration and provisioning the storage system, while being 200x to 2000x less resource-intensive than running the actual applications. === Applied Science, Faculty of === Electrical and Computer Engineering, Department of === Graduate
author Costa, Lauro Beltrão
spellingShingle Costa, Lauro Beltrão
Support for configuration and provisioning of intermediate storage systems
author_facet Costa, Lauro Beltrão
author_sort Costa, Lauro Beltrão
title Support for configuration and provisioning of intermediate storage systems
title_short Support for configuration and provisioning of intermediate storage systems
title_full Support for configuration and provisioning of intermediate storage systems
title_fullStr Support for configuration and provisioning of intermediate storage systems
title_full_unstemmed Support for configuration and provisioning of intermediate storage systems
title_sort support for configuration and provisioning of intermediate storage systems
publisher University of British Columbia
publishDate 2014
url http://hdl.handle.net/2429/51191
work_keys_str_mv AT costalaurobeltrao supportforconfigurationandprovisioningofintermediatestoragesystems
_version_ 1718584527069118464