Summary: | This thesis is about the design and implementation of a program transformation technique for parallelizing loops. The technique is based upon unimodular transformations that was originally proposed to parallelize imperative language loops. Such loops, in which arrays are the dominant data structure, often contain ample parallelism. === Baherjee in (Ban90) proposed algorithms to parallelize double loops in imperative languages. His technique subsumes other results in the area of parallelizing double loops. We have implemented those algorithms in a program called U-Transformer. === In recursively defined Monolithic Array Constructors in functional languages, which we refer to as MACs, a major problem that arises in their efficient parallel implementation is the overhead of scheduling and synchronizing the array operations. In a MAC, explicit evaluation order of the array elements is specified. If a compiler does not identify the data dependences and use this information to determine a safe order, then a conservative implementation has to rely on inefficient or expensive run-time support to ensure that an array element is defined before it is used. === By extending the parallelizing algorithms in (Ban90), we developed algorithms that for a given two-dimensional MAC, explicitly introduce a safe parallel evaluation order. This provides a solution to the problem above. We also implemented these algorithms in a program called AC-Transformer. In both implementations, a one-step unimodular transformation is applied to produce an equivalent double loop in which the outer loop or the inner loop can run in parallel.
|