Towards scalar synchronization in SIMT architectures
An important class of compute accelerators are graphics processing units (GPUs). Popular programming models for non-graphics computation on GPUs, such as CUDA and OpenCL, provide an abstraction of many parallel scalar threads. Contemporary GPU hardware groups 32 to 64 scalar threads as a single warp...
Main Author: | |
---|---|
Language: | English |
Published: |
University of British Columbia
2011
|
Online Access: | http://hdl.handle.net/2429/37732 |