Support for configuration and provisioning of intermediate storage systems

This dissertation focuses on supporting the provisioning and configuration of distributed storage systems in clusters of computers that are designed to provide a high performance computing platform for batch applications. These platforms typically offer a centralized persistent backend storage syste...

Full description

Bibliographic Details
Main Author: Costa, Lauro Beltrão
Language:English
Published: University of British Columbia 2014
Online Access:http://hdl.handle.net/2429/51191
Description
Summary:This dissertation focuses on supporting the provisioning and configuration of distributed storage systems in clusters of computers that are designed to provide a high performance computing platform for batch applications. These platforms typically offer a centralized persistent backend storage system. To avoid the potential bottleneck of accessing the platform's backend storage system, intermediate storage systems aggregate resources allocated to the application to provide a shared temporary storage space dedicated to the application execution. Configuring an intermediate storage system, however, becomes increasingly complex. As a distributed storage system, intermediate storage can employ a wide range of storage techniques that enable workload-dependent trade-offs over interrelated success metrics such as response time, throughput, storage space, and energy consumption. Because it is co-deployed with the application, it offers the user the opportunity to tailor its provisioning and configuration to extract the maximum performance from the infrastructure. For example, the user can optimize the performance by deciding the total number of nodes of an allocation, splitting these nodes, or not, between the application and the intermediate storage, and choosing the values for several configuration parameters for storage techniques with different trade-offs. This dissertation targets the problem of supporting the configuration and provisioning of intermediate storage systems in the context of workflow-based scientific applications that communicate via files -- also known as many-task computing -- as well as checkpointing applications. Specifically, this study proposes performance prediction mechanisms to estimate performance of overall application or storage operations (e.g., an application turn-around time, application's energy consumption, or response time of write operations). By relying on the target application's characteristics, the proposed mechanisms can accelerate the exploration of the configuration space. The mechanisms use monitoring information available at the application level, not requiring changes to the storage system nor specialized monitoring systems. The effectiveness of these mechanisms is evaluated in a number of scenarios -- including different system scale, hardware platforms, and configuration choices. Overall, the mechanisms provide accuracy high enough to support the user's decisions about configuration and provisioning the storage system, while being 200x to 2000x less resource-intensive than running the actual applications. === Applied Science, Faculty of === Electrical and Computer Engineering, Department of === Graduate