COLLECTIVE COMMUNICATION AND BARRIER SYNCHRONIZATION ON NVIDIA CUDA GPU
GPUs (Graphics Processing Units) employ a multi-threaded execution model using multiple SIMD cores. Compared to use of a single SIMD engine, this architecture can scale to more processing elements. However, GPUs sacrifice the timing properties which made barrier synchronization implicit and collecti...
Main Author: | |
---|---|
Format: | Others |
Published: |
UKnowledge
2009
|
Subjects: | |
Online Access: | http://uknowledge.uky.edu/gradschool_theses/635 http://uknowledge.uky.edu/cgi/viewcontent.cgi?article=1639&context=gradschool_theses |