Studies on memory consistency and synchronization : failure detection in parallel programs

We have studied two related issues in the design, execution and debugging of shared memory parallel programs: memory consistency and techniques to detect synchronization failure in parallel programs. Overlapping execution of operations from a thread is an efficient way to effectively tolerate memor...

Full description

Bibliographic Details
Main Author: Fu, Taiqi
Format: Others
Published: 1998
Online Access:http://spectrum.library.concordia.ca/416/1/NQ44810.pdf
Fu, Taiqi <http://spectrum.library.concordia.ca/view/creators/Fu=3ATaiqi=3A=3A.html> (1998) Studies on memory consistency and synchronization : failure detection in parallel programs. PhD thesis, Concordia University.
Description
Summary:We have studied two related issues in the design, execution and debugging of shared memory parallel programs: memory consistency and techniques to detect synchronization failure in parallel programs. Overlapping execution of operations from a thread is an efficient way to effectively tolerate memory latency in shared memory multiprocessor systems. for a programmer to reason about his program, a sequentially consistent machine with full generality must execute memory operations sequentially in program order, leading to poor performance. We propose Link Consistency which still preserves Sequential Consistency for parallel programs that contain no data race or synchronization failure while permitting out of (program) order execution of operations. By classifying synchronization operations according to whether they satisfy some distinct conditions ( Exclusive Producer and Exclusive Consumer ), Link Consistency allows richer relaxation of program orders than some previous work such as Release Consistency. Compared to other work that also tries to improve upon Release Consistency such as PLpc, Link Consistency is conceptually clearer for programmer, and possesses a rigorous basis for reasoning. We also address the following Synchronization Failure detection problem: Given a trace set from a complete (i.e. successful ) execution, whether there exists another execution that can be derived from the same trace set and is incomplete. The answer to the problem is useful in debugging parallel programs. After showing that the problem is NP-complete, we present some techniques to tackle the problem in practice. These include Trace Reduction, Local Failure Validation, and Partial Order Checking. Trace Reduction may reduce the size of a trace set up to a small fraction of its original size while still preserving its failure property. Local Failure Validation explores a heuristic that is based on individual semaphores rather than all semaphores together. Compared to detecting strategies based on interleaving semantics, Partial Order Checking completely avoids the state explosion problem caused by the large amount of concurrence in trace sets. Our experimental result shows that for many trace sets with synchronization failures, the above techniques solve the detection problem in time linear to the trace set sizes, which are measured by N n , the number of events or by N s x N t , the product of the number of semaphores by the number of program threads, in a trace set.