Data Allocation for Distributed Programs
This thesis shows that both data and code must be efficiently distributed to achieve good performance in a distributed system. Most previous research has either tried to distribute code structures to improve parallelism or to distribute data to reduce communication costs. Code distribution (exploiti...
Main Author: | |
---|---|
Format: | Others |
Published: |
PDXScholar
1995
|
Subjects: | |
Online Access: | https://pdxscholar.library.pdx.edu/open_access_etds/5102 https://pdxscholar.library.pdx.edu/cgi/viewcontent.cgi?article=6174&context=open_access_etds |
id |
ndltd-pdx.edu-oai-pdxscholar.library.pdx.edu-open_access_etds-6174 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-pdx.edu-oai-pdxscholar.library.pdx.edu-open_access_etds-61742019-10-26T05:12:07Z Data Allocation for Distributed Programs Setiowijoso, Liono This thesis shows that both data and code must be efficiently distributed to achieve good performance in a distributed system. Most previous research has either tried to distribute code structures to improve parallelism or to distribute data to reduce communication costs. Code distribution (exploiting functional parallelism) is an effort to distribute or to duplicate function codes to optimize parallel performance. On the other hand, data distribution tries to place data structures as close as possible to the function codes that use it, so that communication cost can be reduced. In particular, dataflow researchers have primarily focused on code partitioning and assignment. We have adapted existing data allocation algorithms for use with an existing dataflow-based system, ParPlum. ParPlum allows the execution of dataflow graphs on networks of workstations. To evaluate the impact of data allocation, we extended ParPlum to more effectively handle data structures. We then implemented tools to extract from dataflow graphs information that is relevant to the mapping algorithms and fed this information to our version of a data distribution algorithm. To see the relation between code and data parallelism we added optimization to optimize the distribution of the loop function components and the data structure access components. All of these are done automatically without programmer or user involvement. We ran a number of experiments using matrix multiplication as our workload. We used different numbers of processors and different existing partitioning and allocation algorithm. Our results show that automatic data distribution greatly improves the performance of distributed dataflow applications. For example, with 15 x 15 matrices, applying data distribution speeds up execution about 80% on 7 machines. Using data distribution and our code-optimizations on 7 machines speeds up execution over the base case by 800%. Our work shows that it is possible to make efficient use of distributed networks with compiler support and shows that both code mapping and data mapping must be considered to achieve optimal performance. 1995-08-11T07:00:00Z text application/pdf https://pdxscholar.library.pdx.edu/open_access_etds/5102 https://pdxscholar.library.pdx.edu/cgi/viewcontent.cgi?article=6174&context=open_access_etds Dissertations and Theses PDXScholar Electronic data processing -- Distributed processing Electrical and Computer Engineering Engineering |
collection |
NDLTD |
format |
Others
|
sources |
NDLTD |
topic |
Electronic data processing -- Distributed processing Electrical and Computer Engineering Engineering |
spellingShingle |
Electronic data processing -- Distributed processing Electrical and Computer Engineering Engineering Setiowijoso, Liono Data Allocation for Distributed Programs |
description |
This thesis shows that both data and code must be efficiently distributed to achieve good performance in a distributed system. Most previous research has either tried to distribute code structures to improve parallelism or to distribute data to reduce communication costs. Code distribution (exploiting functional parallelism) is an effort to distribute or to duplicate function codes to optimize parallel performance. On the other hand, data distribution tries to place data structures as close as possible to the function codes that use it, so that communication cost can be reduced. In particular, dataflow researchers have primarily focused on code partitioning and assignment. We have adapted existing data allocation algorithms for use with an existing dataflow-based system, ParPlum. ParPlum allows the execution of dataflow graphs on networks of workstations. To evaluate the impact of data allocation, we extended ParPlum to more effectively handle data structures. We then implemented tools to extract from dataflow graphs information that is relevant to the mapping algorithms and fed this information to our version of a data distribution algorithm. To see the relation between code and data parallelism we added optimization to optimize the distribution of the loop function components and the data structure access components. All of these are done automatically without programmer or user involvement. We ran a number of experiments using matrix multiplication as our workload. We used different numbers of processors and different existing partitioning and allocation algorithm. Our results show that automatic data distribution greatly improves the performance of distributed dataflow applications. For example, with 15 x 15 matrices, applying data distribution speeds up execution about 80% on 7 machines. Using data distribution and our code-optimizations on 7 machines speeds up execution over the base case by 800%. Our work shows that it is possible to make efficient use of distributed networks with compiler support and shows that both code mapping and data mapping must be considered to achieve optimal performance. |
author |
Setiowijoso, Liono |
author_facet |
Setiowijoso, Liono |
author_sort |
Setiowijoso, Liono |
title |
Data Allocation for Distributed Programs |
title_short |
Data Allocation for Distributed Programs |
title_full |
Data Allocation for Distributed Programs |
title_fullStr |
Data Allocation for Distributed Programs |
title_full_unstemmed |
Data Allocation for Distributed Programs |
title_sort |
data allocation for distributed programs |
publisher |
PDXScholar |
publishDate |
1995 |
url |
https://pdxscholar.library.pdx.edu/open_access_etds/5102 https://pdxscholar.library.pdx.edu/cgi/viewcontent.cgi?article=6174&context=open_access_etds |
work_keys_str_mv |
AT setiowijosoliono dataallocationfordistributedprograms |
_version_ |
1719278757061066752 |