Placement of replicas in large-scale data grid environments

Data Grids provide services and infrastructure for distributed data-intensive applications accessing massive geographically distributed datasets. An important technique to speed access in Data Grids is replication, which provides nearby data access. Although data replication is one of the major tech...

Full description

Bibliographic Details
Main Author: Shorfuzzaman, Mohammad
Other Authors: Eskicioglu, Rasit (Computer Science) Graham, Peter (Computer Science)
Published: 2012
Subjects:
Online Access:http://hdl.handle.net/1993/5209
id ndltd-LACETR-oai-collectionscanada.gc.ca-MWU.1993-5209
record_format oai_dc
spelling ndltd-LACETR-oai-collectionscanada.gc.ca-MWU.1993-52092014-03-29T03:44:12Z Placement of replicas in large-scale data grid environments Shorfuzzaman, Mohammad Eskicioglu, Rasit (Computer Science) Graham, Peter (Computer Science) Anderson, John (Computer Science) Diamond, Jeff (Electrical and Computer Engineering) Reinefeld, Alexander (Computer Science, Zuse Institute Berlin, Germany) Data grids Replication Dynamic programming Distributed replica placement Access performance Centralized replica placement Data Grids provide services and infrastructure for distributed data-intensive applications accessing massive geographically distributed datasets. An important technique to speed access in Data Grids is replication, which provides nearby data access. Although data replication is one of the major techniques for promoting high data access, the problem of replica placement has not been widely studied for large-scale Grid environments. In this thesis, I propose improved data placement techniques useful when replicating potentially large data files in wide area data grids. These techniques are aimed at achieving faster data access as well as efficient utilization of bandwidth and storage resources. At the core of my approach is a new highly distributed replica placement algorithm that places data in strategic locations to improve overall data access performance while satisfying varying user/application and system demands. This improved efficiency of access to large data will improve the practicality of large-scale data and compute intensive collaborative scientific endeavors. My thesis makes several contributions towards improving the state-of-the-art for replica placement in large-scale data grid environments. The major contributions are: (i) development of a new popularity-driven dynamic replica placement algorithm for hierarchically structured data grids that balance storage space utilisation and access latency; (ii) creation of an adaptive version of the base algorithm to dynamically adapt the frequency and degree of replication based on such factors as data request arrival rates, available storage capacities, etc.; (iii) development of a new highly distributed algorithm to determine a near-optimal replica placement while minimizing replication cost (access and update) for a given traffic pattern; (iv) creation of a distributed QoS-aware replica placement algorithm that supports multiple quality requirements both from user and system perspectives to support efficient transfers of large replicas. Simulation results using widely observed data access patterns demonstrate how the effectiveness of my replica placement techniques is affected by various factors such as grid network characteristics (i.e. topology, number of nodes, storage and workload capacities of replica servers, link capacities, traffic pattern), QoS requirements, and so on. Finally, I compare the performance of my algorithms to a number of relevant algorithms from the literature and demonstrate their usefulness and superiority for conditions of interest. 2012-03-26T20:35:05Z 2012-03-26T20:35:05Z 2012-03-26 http://hdl.handle.net/1993/5209
collection NDLTD
sources NDLTD
topic Data grids
Replication
Dynamic programming
Distributed replica placement
Access performance
Centralized replica placement
spellingShingle Data grids
Replication
Dynamic programming
Distributed replica placement
Access performance
Centralized replica placement
Shorfuzzaman, Mohammad
Placement of replicas in large-scale data grid environments
description Data Grids provide services and infrastructure for distributed data-intensive applications accessing massive geographically distributed datasets. An important technique to speed access in Data Grids is replication, which provides nearby data access. Although data replication is one of the major techniques for promoting high data access, the problem of replica placement has not been widely studied for large-scale Grid environments. In this thesis, I propose improved data placement techniques useful when replicating potentially large data files in wide area data grids. These techniques are aimed at achieving faster data access as well as efficient utilization of bandwidth and storage resources. At the core of my approach is a new highly distributed replica placement algorithm that places data in strategic locations to improve overall data access performance while satisfying varying user/application and system demands. This improved efficiency of access to large data will improve the practicality of large-scale data and compute intensive collaborative scientific endeavors. My thesis makes several contributions towards improving the state-of-the-art for replica placement in large-scale data grid environments. The major contributions are: (i) development of a new popularity-driven dynamic replica placement algorithm for hierarchically structured data grids that balance storage space utilisation and access latency; (ii) creation of an adaptive version of the base algorithm to dynamically adapt the frequency and degree of replication based on such factors as data request arrival rates, available storage capacities, etc.; (iii) development of a new highly distributed algorithm to determine a near-optimal replica placement while minimizing replication cost (access and update) for a given traffic pattern; (iv) creation of a distributed QoS-aware replica placement algorithm that supports multiple quality requirements both from user and system perspectives to support efficient transfers of large replicas. Simulation results using widely observed data access patterns demonstrate how the effectiveness of my replica placement techniques is affected by various factors such as grid network characteristics (i.e. topology, number of nodes, storage and workload capacities of replica servers, link capacities, traffic pattern), QoS requirements, and so on. Finally, I compare the performance of my algorithms to a number of relevant algorithms from the literature and demonstrate their usefulness and superiority for conditions of interest.
author2 Eskicioglu, Rasit (Computer Science) Graham, Peter (Computer Science)
author_facet Eskicioglu, Rasit (Computer Science) Graham, Peter (Computer Science)
Shorfuzzaman, Mohammad
author Shorfuzzaman, Mohammad
author_sort Shorfuzzaman, Mohammad
title Placement of replicas in large-scale data grid environments
title_short Placement of replicas in large-scale data grid environments
title_full Placement of replicas in large-scale data grid environments
title_fullStr Placement of replicas in large-scale data grid environments
title_full_unstemmed Placement of replicas in large-scale data grid environments
title_sort placement of replicas in large-scale data grid environments
publishDate 2012
url http://hdl.handle.net/1993/5209
work_keys_str_mv AT shorfuzzamanmohammad placementofreplicasinlargescaledatagridenvironments
_version_ 1716658403694084096