COLLECTIVE COMMUNICATION AND BARRIER SYNCHRONIZATION ON NVIDIA CUDA GPU

GPUs (Graphics Processing Units) employ a multi-threaded execution model using multiple SIMD cores. Compared to use of a single SIMD engine, this architecture can scale to more processing elements. However, GPUs sacrifice the timing properties which made barrier synchronization implicit and collecti...

Full description

Bibliographic Details
Main Author: Rivera-Polanco, Diego Alejandro
Format: Others
Published: UKnowledge 2009
Subjects:
GPU
Online Access:http://uknowledge.uky.edu/gradschool_theses/635
http://uknowledge.uky.edu/cgi/viewcontent.cgi?article=1639&context=gradschool_theses