Risk assessment models for resource failure in grid computing

Service Level Agreements (SLAs) are introduced in order to overcome the limitations associated with the best-effort approach in Grid computing, and to accordingly make Grid computing more attractive for commercial uses. However, commercial Grid providers are not keen to adopt SLAs since there is a r...

Full description

Bibliographic Details
Main Author: Alsoghayer, Raid Abdullah
Other Authors: Djemam, K.
Published: University of Leeds 2011
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.541398
id ndltd-bl.uk-oai-ethos.bl.uk-541398
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-5413982017-10-04T03:36:35ZRisk assessment models for resource failure in grid computingAlsoghayer, Raid AbdullahDjemam, K.2011Service Level Agreements (SLAs) are introduced in order to overcome the limitations associated with the best-effort approach in Grid computing, and to accordingly make Grid computing more attractive for commercial uses. However, commercial Grid providers are not keen to adopt SLAs since there is a risk of SLA violation as a result of resource failure, which will result in a penalty fee; therefore, the need to model the resources risk of failure is critical to Grid resource providers. Essentially, moving from the best-effort approach for accepting SLAs to a risk-aware approach assists the Grid resource provider to provide a high-level Quality of Service (QoS). Moreover, risk is an important factor in establishing the resource price and penalty fee in the case of resource failure. In light of this, we propose a mathematical model to predict the risk of failure of a Grid resource using a discrete-time analytical model driven by reliability functions fitted to observed data. The model relies on the resource historical information so as to predict the probability of the resource failure (risk of failure) for a given time interval. The model was evaluated by comparing the predicted risk of failure with the observed risk of failure using availability data gathered from Grids resources. The risk of failure is an important property of a Grid resource, especially when scheduling jobs optimally in relation to resources so as to achieve a business objective. However, in Grid computing, user-centric scheduling algorithms ignore the risk factor and mostly address the minimisation of the cost of the resource allocation, or the overall deadline by which the job must be executed completely. Therefore, we propose a novel user-centric scheduling algorithm for scheduling Bag of Tasks (BoT) applications. The algorithm, which aims to meet user requirements, takes into account the risk of failure, the cost of resources and the job deadline. With this in mind, through simulation, we demonstrate that the algorithm provides a near-optimal solution for minimizing the cost of executing BoT jobs. Also, we show that the execution time of the proposed algorithm is very low, and is therefore suitable for solving scheduling problems in real-time. Risk assessment benefits the resource provider by providing methods to either support accepting or rejecting an SLA. Moreover, it will enable the resource provider to understand the capacity of the infrastructure and to thereby plan future investment. Scheduling algorithms will benefit the resource provider by providing methods to meet user requirements and the better utilisation of resources. The ability to adopt a risk assessment method and user-centric algorithms makes the exploitation of Grid systems more realistic.658.05University of Leedshttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.541398http://etheses.whiterose.ac.uk/1909/Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 658.05
spellingShingle 658.05
Alsoghayer, Raid Abdullah
Risk assessment models for resource failure in grid computing
description Service Level Agreements (SLAs) are introduced in order to overcome the limitations associated with the best-effort approach in Grid computing, and to accordingly make Grid computing more attractive for commercial uses. However, commercial Grid providers are not keen to adopt SLAs since there is a risk of SLA violation as a result of resource failure, which will result in a penalty fee; therefore, the need to model the resources risk of failure is critical to Grid resource providers. Essentially, moving from the best-effort approach for accepting SLAs to a risk-aware approach assists the Grid resource provider to provide a high-level Quality of Service (QoS). Moreover, risk is an important factor in establishing the resource price and penalty fee in the case of resource failure. In light of this, we propose a mathematical model to predict the risk of failure of a Grid resource using a discrete-time analytical model driven by reliability functions fitted to observed data. The model relies on the resource historical information so as to predict the probability of the resource failure (risk of failure) for a given time interval. The model was evaluated by comparing the predicted risk of failure with the observed risk of failure using availability data gathered from Grids resources. The risk of failure is an important property of a Grid resource, especially when scheduling jobs optimally in relation to resources so as to achieve a business objective. However, in Grid computing, user-centric scheduling algorithms ignore the risk factor and mostly address the minimisation of the cost of the resource allocation, or the overall deadline by which the job must be executed completely. Therefore, we propose a novel user-centric scheduling algorithm for scheduling Bag of Tasks (BoT) applications. The algorithm, which aims to meet user requirements, takes into account the risk of failure, the cost of resources and the job deadline. With this in mind, through simulation, we demonstrate that the algorithm provides a near-optimal solution for minimizing the cost of executing BoT jobs. Also, we show that the execution time of the proposed algorithm is very low, and is therefore suitable for solving scheduling problems in real-time. Risk assessment benefits the resource provider by providing methods to either support accepting or rejecting an SLA. Moreover, it will enable the resource provider to understand the capacity of the infrastructure and to thereby plan future investment. Scheduling algorithms will benefit the resource provider by providing methods to meet user requirements and the better utilisation of resources. The ability to adopt a risk assessment method and user-centric algorithms makes the exploitation of Grid systems more realistic.
author2 Djemam, K.
author_facet Djemam, K.
Alsoghayer, Raid Abdullah
author Alsoghayer, Raid Abdullah
author_sort Alsoghayer, Raid Abdullah
title Risk assessment models for resource failure in grid computing
title_short Risk assessment models for resource failure in grid computing
title_full Risk assessment models for resource failure in grid computing
title_fullStr Risk assessment models for resource failure in grid computing
title_full_unstemmed Risk assessment models for resource failure in grid computing
title_sort risk assessment models for resource failure in grid computing
publisher University of Leeds
publishDate 2011
url http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.541398
work_keys_str_mv AT alsoghayerraidabdullah riskassessmentmodelsforresourcefailureingridcomputing
_version_ 1718545129117057024