Perscriptive performance tuning: The R(X) approach

Programmers often rely on performance analysis tools to provide feedback about the execution of their applications. However, the nature of this feedback is far from satisfactory. Often the feedback is purely descriptive and at a very low-level, making it difficult for the programmer to rectify perfo...

Full description

Bibliographic Details
Main Author: Rajamony, Ramakrishnan
Other Authors: Zwaenepoel, Willy E.
Format: Others
Language:English
Published: 2009
Subjects:
Online Access:http://hdl.handle.net/1911/19302
Description
Summary:Programmers often rely on performance analysis tools to provide feedback about the execution of their applications. However, the nature of this feedback is far from satisfactory. Often the feedback is purely descriptive and at a very low-level, making it difficult for the programmer to rectify performance problems. This dissertation demonstrates a new approach to performance tuning: prescriptive performance debugging. Our approach can greatly reduce the burdens imposed on the programmer compared to existing performance analysis tools. The basis of this approach is a set of requirements that must be satisfied by a performance analysis tool. In problem domains where these requirements can be met, a performance tool can prescribe source-level changes to improve performance. R$\sb{\rm x}$ is one such tool that we have developed to improve the performance of explicitly parallel shared memory programs. R$\sb{\rm x}$ targets inter-process synchronization and data communication, two significant sources of overhead in shared-memory applications. R$\sb{\rm x}$ automatically analyzes run-time data from program executions to prescribe transformations that reduce synchronization and some forms of data communication. This feedback is at the source-code level, eliminating the need for machine-level reasoning about the program. A correctness framework ensures that transformations obtained from one or a small set of executions will be applicable to all executions. In a few cases, feedback from R$\sb{\rm x}$ has made a crucial difference, enabling applications that were originally slowing down on multiple processors to achieve a speedup. In summary, this dissertation makes three contributions: (i) A new approach for designing performance tools, enabling the prescription of source-level changes to improve performance, (ii) a set of algorithms to detect excess synchronization and some forms of excess data communication in explicitly parallel shared memory programs, and (iii) a set of low-overhead techniques to collect run-time information for performance tuning.