Towards scalar synchronization in SIMT architectures

An important class of compute accelerators are graphics processing units (GPUs). Popular programming models for non-graphics computation on GPUs, such as CUDA and OpenCL, provide an abstraction of many parallel scalar threads. Contemporary GPU hardware groups 32 to 64 scalar threads as a single warp...

Full description

Bibliographic Details
Main Author: Ramamurthy, Arun
Language:English
Published: University of British Columbia 2011
Online Access:http://hdl.handle.net/2429/37732