Summary: | Loops are the main source of parallelism in scientific programs. Hence, several
techniques were developed to detect parallelism in these loops and transform them into
parallel forms. In this dissertation, compile time transformations and efficient parallel
execution of loops with various type of dependencies are investigated. First, Doacross
loops with uniform dependencies are considered for execution on distributed memory
parallel machines (multicomputers). Most known Doacross loop execution techniques
can be applied efficiently only to shared memory parallel machines. In this thesis,
code reordering technique, improvements to partitioning strategies, and finding a balance
between communication and parallelism are presented to reduce the execution time of
Doacross loops on multicomputers. As with most paralleizing transformation techniques,
only single loopnests are considered in the first part of this dissertation. However,
parallelizing each loopnest in a program separately, even if an optimal execution can
be obtained for each loopnest, may not result in an efficient execution of all the code in
the program because of communication overhead across the loops in a multicomputer
environment. Hence, across loop data dependence representation and analysis are
introduced in this work to improve the parallel execution of the whole code. Our
contribution consists of finding and representing data dependencies whose sources and
destinations are subspaces of the iteration space mainly common across the loops. This
type of dependence information is used in this thesis to improve global iteration space
partitioning, automatic generation of communication statements across ioops, and index
alignment. The final part of this dissertation presents new parallelizing techniques for loops with irregular and complex dependencies. Various data dependence analysis
algorithms can be found in the literature even for loops with complex array indexing.
However, the improvement in data dependence testing has not been followed by similar
amelioration in restructuring transformations for loops with complex dependencies. Such
loops are mostly executed in serial mode. Our parallelizing techniques for these loops
consists of identifying regions of the iteration space where all iterations can be executed
in parallel. The advantages of all the transformations presented in this dissertation are: 1)
they significantly reduce the execution time of ioops with various types of dependencies
as shown in this work using the MasPar machine 2) they can be implemented at compile
time which makes the task of parallel programming easier.
|