Summary: | 博士 === 國立交通大學 === 資訊工程系 === 88 === There are many methods for nested loop partitioning. However, most of them per-form poorly when partitioning loops with non-uniform dependencies. This dissertation proposes several generalized and optimized partitioning mechanisms to exploit paral-lelism from nested loops with non-uniform dependencies. By this way, current highly parallel multiprocessor systems can be fully utilized.
First, we propose new techniques adaptable in extracting parallelism of loop nests with non-uniform dependencies, by which more parallelism is explored effec-tively using their irregularity. These mechanisms are parallelization part splitting (PPS), partial parallelization decomposition (PPD), irregular loop interchange(ILI) and grow-ing pattern detection(GPD). They not only use transformation techniques but also de-tect special parallelization patterns for non-uniform dependence nested loops. The abo-ve mechanisms can not only be applied to loops with special dependence patterns but also be combined with following new techniques. However, most of loops does not have such special dependence vector patters. Thus, a new loop partitioning mechanism that can handle loops with general dependence vector patterns is necessary.
Second, we propose a loop partitioning mechanism called the Optimized Three Region Partitioning (OTRP) method based on dependence convex theory to divide the loop that has general dependence vector patterns into variable size partitions. It parti-tions the loop into two parallel regions and one serial region. However, there are still one serial region left in this mechanism. Thus, we will propose two approaches to resol-ve this situation. One way is to parallelize the above serial region effectively and the other is to develop a new method whose performance is independent of the size of the inherent serial region.
Third, we propose a method whose efficiency is independent of the size of the in-herent serial region called the Optimized Dependence Convex Hull Partitioning (ODCHP) method. This mechanism is suitable for loops with large inherent serial re-gion. Although the algorithm of this mechanism has higher complexity, it is more effec-tive to resolve the above case directly.
Finally, we develop a new mechanism called the two stage partitioning (TSP) mechanism based on dependence convex theory and three region partitioning technique can parallelize the serial region of the OTRP method effectively. Compared with other popular techniques, our schemes show a dramatic improvement in performance on pou-plar program models and real program code segments.
|