Optimizing hardware granularity in parallel systems
In order for parallel architectures to be of significant use in providing superior performance to uniprocessors, the benefits of splitting the workload among several processing elements must outweigh the overheads associated with this "divide and conquer" strategy. Whether or not this is t...
Main Author: | |
---|---|
Published: |
University of Edinburgh
1995
|
Subjects: | |
Online Access: | http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.653270 |
id |
ndltd-bl.uk-oai-ethos.bl.uk-653270 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-bl.uk-oai-ethos.bl.uk-6532702016-06-21T03:22:27ZOptimizing hardware granularity in parallel systemsKelly, Thomas1995In order for parallel architectures to be of significant use in providing superior performance to uniprocessors, the benefits of splitting the workload among several processing elements must outweigh the overheads associated with this "divide and conquer" strategy. Whether or not this is the case depends on the nature of the algorithm and on the cost:performance functions associated with the real computer hardware available at a given time. This thesis is an investigation into the tradeoff of <I>grain</I> of hardware versus <I>speed</I> of hardware, in an attempt to show how the optimal hardware parallelism can be assessed. A model is developed of the execution time <I>T</I> of an algorithm on a machine as a function of the number of nodes, <I>N.</I> The model is used to examine the degree to which it is possible to obtain an optimal value of <I>N</I>, corresponding to minimum execution time. Specifically, the optimization is investigated assuming a particular base architecture, an algorithm or class thereof and an overall hardware cost. Two base architectures and algorithm types are considered, corresponding to two common classes of parallel architectures: a shared memory multi-processor and a message-passing multi-computer. The former is represented by a simple shared-bus multi-processor in which each processing element performs operations on data stored in a global shared store. The second type is represented by a two-dimensional mesh-connected multi-connected multi-computer. In this type of system all memory is considered private and data sharing is carried out using "messages" explicitly passed among the processing elements.004University of Edinburghhttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.653270http://hdl.handle.net/1842/15141Electronic Thesis or Dissertation |
collection |
NDLTD |
sources |
NDLTD |
topic |
004 |
spellingShingle |
004 Kelly, Thomas Optimizing hardware granularity in parallel systems |
description |
In order for parallel architectures to be of significant use in providing superior performance to uniprocessors, the benefits of splitting the workload among several processing elements must outweigh the overheads associated with this "divide and conquer" strategy. Whether or not this is the case depends on the nature of the algorithm and on the cost:performance functions associated with the real computer hardware available at a given time. This thesis is an investigation into the tradeoff of <I>grain</I> of hardware versus <I>speed</I> of hardware, in an attempt to show how the optimal hardware parallelism can be assessed. A model is developed of the execution time <I>T</I> of an algorithm on a machine as a function of the number of nodes, <I>N.</I> The model is used to examine the degree to which it is possible to obtain an optimal value of <I>N</I>, corresponding to minimum execution time. Specifically, the optimization is investigated assuming a particular base architecture, an algorithm or class thereof and an overall hardware cost. Two base architectures and algorithm types are considered, corresponding to two common classes of parallel architectures: a shared memory multi-processor and a message-passing multi-computer. The former is represented by a simple shared-bus multi-processor in which each processing element performs operations on data stored in a global shared store. The second type is represented by a two-dimensional mesh-connected multi-connected multi-computer. In this type of system all memory is considered private and data sharing is carried out using "messages" explicitly passed among the processing elements. |
author |
Kelly, Thomas |
author_facet |
Kelly, Thomas |
author_sort |
Kelly, Thomas |
title |
Optimizing hardware granularity in parallel systems |
title_short |
Optimizing hardware granularity in parallel systems |
title_full |
Optimizing hardware granularity in parallel systems |
title_fullStr |
Optimizing hardware granularity in parallel systems |
title_full_unstemmed |
Optimizing hardware granularity in parallel systems |
title_sort |
optimizing hardware granularity in parallel systems |
publisher |
University of Edinburgh |
publishDate |
1995 |
url |
http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.653270 |
work_keys_str_mv |
AT kellythomas optimizinghardwaregranularityinparallelsystems |
_version_ |
1718312572521807872 |