Choosing Between Remote I/O versus Staging in Distributed Environment
Today, scientific applications and experiments have become increasingly complex and more demanding in terms of their computational and data requirements. The amount of data generated and used has grown at a very rapid rate. As tens or hundreds of terabytes of data for a single application is very co...
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Language: | en |
Published: |
LSU
2010
|
Subjects: | |
Online Access: | http://etd.lsu.edu/docs/available/etd-06082010-092441/ |
id |
ndltd-LSU-oai-etd.lsu.edu-etd-06082010-092441 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-LSU-oai-etd.lsu.edu-etd-06082010-0924412013-01-07T22:52:50Z Choosing Between Remote I/O versus Staging in Distributed Environment Suslu, Ibrahim Hakki Computer Science Today, scientific applications and experiments have become increasingly complex and more demanding in terms of their computational and data requirements. The amount of data generated and used has grown at a very rapid rate. As tens or hundreds of terabytes of data for a single application is very common today; petabytes and even exabytes of data will be very common in a few years. One of the major challenges in distributed computing environments is how to access these large datasets remotely over the network. Data staging and remote I/O are the most widely used data access methods for distributed applications. Application developers generally chose one over the other intuitively without making any scientific comparison specific to their applications since there is no generic model available that they can use. In this thesis, we develop generic models and set guidelines for the application developers which would help them to choose the most appropriate data access method for their application. We define the parameters that potentially affect the end-to-end performance of the distributed applications which need to access remote data. To achieve our goal, we implement a series of synthetic benchmark applications to simulate different data access patterns. We run these benchmark applications on different distributed computing settings with different parameters, such as network bandwidth, server and client capabilities, and data access ratio. We also use different remote I/O protocols to show the importance of the protocol in making a decision. We use regression analysis to develop applicable generic models for comparing different data access methods, and test our models in a real life application. The main contribution of our thesis is generic models that can be applied to most data-intensive distributed applications to decide the best data access technique for those applications. Our models provide the scientists and application developers an opportunity to choose the best data access method before actually running the application. Kosar, Tevfik Van Scotter, James R Gabrielle, Allen Karki, Bijaya Ishak, Sherif LSU 2010-06-10 text application/pdf http://etd.lsu.edu/docs/available/etd-06082010-092441/ http://etd.lsu.edu/docs/available/etd-06082010-092441/ en unrestricted I hereby certify that, if appropriate, I have obtained and attached herein a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to LSU or its agents the non-exclusive license to archive and make accessible, under the conditions specified below and in appropriate University policies, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report. |
collection |
NDLTD |
language |
en |
format |
Others
|
sources |
NDLTD |
topic |
Computer Science |
spellingShingle |
Computer Science Suslu, Ibrahim Hakki Choosing Between Remote I/O versus Staging in Distributed Environment |
description |
Today, scientific applications and experiments have become increasingly complex and more demanding in terms of their computational and data requirements. The amount of data generated and used has grown at a very rapid rate. As tens or hundreds of terabytes of data for a single application is
very common today; petabytes and even exabytes of data will be very common in a few years. One of the major challenges in distributed computing environments is how to access these large datasets remotely over the network.
Data staging and remote I/O are the most widely used data access methods for distributed applications. Application developers generally chose one over the other intuitively without making any scientific comparison specific to their applications since there is no generic model available that they
can use.
In this thesis, we develop generic models and set guidelines for the application developers which would help them to choose the most appropriate data access method for their application. We define the parameters that potentially affect the end-to-end performance of the distributed applications which need to access remote data.
To achieve our goal, we implement a series of synthetic benchmark applications to simulate different data access patterns. We run these benchmark applications on different distributed computing settings with different parameters, such as network bandwidth, server and client capabilities, and
data access ratio. We also use different remote I/O protocols to show the importance of the protocol in making a decision. We use regression analysis to develop applicable generic models for comparing different data access methods, and test our models in a real life application.
The main contribution of our thesis is generic models that can be applied to most data-intensive distributed applications to decide the best data access technique for those applications. Our models provide the scientists and application developers an opportunity to choose the best data access method before actually running the application.
|
author2 |
Kosar, Tevfik |
author_facet |
Kosar, Tevfik Suslu, Ibrahim Hakki |
author |
Suslu, Ibrahim Hakki |
author_sort |
Suslu, Ibrahim Hakki |
title |
Choosing Between Remote I/O versus Staging in Distributed Environment |
title_short |
Choosing Between Remote I/O versus Staging in Distributed Environment |
title_full |
Choosing Between Remote I/O versus Staging in Distributed Environment |
title_fullStr |
Choosing Between Remote I/O versus Staging in Distributed Environment |
title_full_unstemmed |
Choosing Between Remote I/O versus Staging in Distributed Environment |
title_sort |
choosing between remote i/o versus staging in distributed environment |
publisher |
LSU |
publishDate |
2010 |
url |
http://etd.lsu.edu/docs/available/etd-06082010-092441/ |
work_keys_str_mv |
AT susluibrahimhakki choosingbetweenremoteioversusstagingindistributedenvironment |
_version_ |
1716477744354689024 |