Optimizing UPC Programs for Multi-Core Systems
The Partitioned Global Address Space (PGAS) model of Unified Parallel C (UPC) can help users express and manage application data locality on non-uniform memory access (NUMA) multi-core shared-memory systems to get good performance. First, we describe several UPC program optimization techniques that...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi Limited
2010-01-01
|
Series: | Scientific Programming |
Online Access: | http://dx.doi.org/10.3233/SPR-2010-0310 |
Summary: | The Partitioned Global Address Space (PGAS) model of Unified Parallel C (UPC) can help users express and manage application data locality on non-uniform memory access (NUMA) multi-core shared-memory systems to get good performance. First, we describe several UPC program optimization techniques that are important to achieving good performance on NUMA multi-core computers with examples and quantitative performance results. Second, we use two numerical computing kernels, parallel matrix–matrix multiplication and parallel 3-D FFT, to demonstrate the end-to-end development and optimization for UPC applications. Our results show that the optimized UPC programs achieve very good and scalable performance on current multi-core systems and can even outperform vendor-optimized libraries in some cases. |
---|---|
ISSN: | 1058-9244 1875-919X |