Efficient Scientific Workflow Scheduling in Cloud Environment

Cloud computing enables the delivery of remote computing, software and storage services through web browsers following pay-as-you-go model. In addition to successful commercial applications, many research efforts including DOE Magellan Cloud project focus on discovering the opportunities and challen...

Full description

Bibliographic Details
Main Author: Cao, Fei
Format: Others
Published: OpenSIUC 2014
Subjects:
Online Access:https://opensiuc.lib.siu.edu/dissertations/802
https://opensiuc.lib.siu.edu/cgi/viewcontent.cgi?article=1805&context=dissertations
Description
Summary:Cloud computing enables the delivery of remote computing, software and storage services through web browsers following pay-as-you-go model. In addition to successful commercial applications, many research efforts including DOE Magellan Cloud project focus on discovering the opportunities and challenges arising from the computing and data-intensive scientific applications that are not well addressed by the current supercomputers, Linux clusters and Grid technologies. The elastic resource provision, noninterfering resource sharing and flexible customized configuration provided by the Cloud infrastructure has shed light on efficient execution of many scientific applications modeled as Directed Acyclic Graph (DAG) structured workflows to enforce the intricate dependency among a large number of different processing tasks. Meanwhile, the Cloud environment poses various challenges. Cloud providers and Cloud users pursue different goals. Providers aim to maximize profit by achieving higher resource utilization and users want to minimize expenses while meeting their performance requirements. Moreover, due to the expanding Cloud services and emerging newer technologies, the ever-increasing heterogeneity of the Cloud environment complicates the challenges for both parties. In this thesis, we address the workflow scheduling problem from different applications and various objectives. For batch applications, due to the increasing deployment of many data centers and computer servers around the globe escalated by the higher electricity price, the energy cost on running the computing, communication and cooling together with the amount of CO2 emissions have skyrocketed. In order to maintain sustainable Cloud computing facing with ever-increasing problem complexity and big data size in the next decades, we design and develop energy-aware scientific workflow scheduling algorithm to minimize energy consumption and CO2 emission while still satisfying certain Quality of Service (QoS) such as response time specified in Service Level Agreement (SLA). Furthermore, the underlying Cloud hardware/Virtual Machine (VM) resource availability is time-dependent because of the dual operation modes namely on-demand and reservation instances at various Cloud data centers. We also apply techniques such as Dynamic Voltage and Frequency Scaling (DVFS) and DNS scheme to further reduce energy consumption within acceptable performance bounds. Our multiple-step resource provision and allocation algorithm achieves the response time requirement in the step of forward task scheduling and minimizes the VM overhead for reduced energy consumption and higher resource utilization rate in the backward task scheduling step. We also evaluate the candidacy of multiple data centers from the energy and performance efficiency perspectives as different data centers have various energy and cost related parameters. For streaming applications, we formulate scheduling problems with two different objectives, namely one is to maximize the throughput under a budget constraint while another is to minimize execution cost under a minimum throughput constraint. Two different algorithms named as Budget constrained RATE (B-RATE) and Budget constrained SWAP (B-SWAP) are designed under the first objective; Another two algorithms, namely Throughput constrained RATE (TP-RATE) and Throughput constrained SWAP (TP-SWAP) are developed under the second objective.