Summary: | Multithreading is a promising approach to address the problems inherent in multiprocessor systems, such as network and synchronization latencies. Moreover, the benefits of multithreading are not limited to loop-based algorithms but apply also to irregular parallelism. EARTH - Efficient Architecture for Running THreads, is a multithreaded model supporting fine-grain, non-preemptive threads. This model is supported by a C-based runtime system which provides the multithreaded environment for the execution of concurrent programs. === This thesis describes the design and implementation of a set of dynamic load balancing algorithms, and an in-depth study of their behavior with divide-and-conquer, regular, and irregular classes of applications. The results described in this thesis are based on EARTH-SP2, an implementation of the EARTH program execution model on the IBM SP-2, a distributed memory multiprocessor system. The main results of this study are as follows: (1) A randomizing load balancer with both sender and receiver components using global load state information provides scalable, robust performance for recursive and irregular applications. Furthermore, a randomizing algorithm performs the best as long as the cost of computing the random number does not dominate the overall time of thread execution. (2) Load state information outperforms history information for irregular and recursive applications. However for regular applications, history information is more preferable. (3) A purely sender-initiated algorithm is the best choice in two scenarios: barrier-synchronized applications, and very fine-grain applications at low input workloads. (4) A simple, work-stealing load balancer is preferable for applications with modest thread granularities, and very low workloads. Other major contributions include: (1) Description of a runtime system for a non-blocking, non-preemptive multithreaded programming model. (2) A detailed analysis of costs associated with EARTH operations, and a comparative study of EARTH performance on three different platforms. (3) Proposal of a new classification scheme for multithreaded systems. This is supplemented by an extensive literature survey.
|