Analysis of Dynamic Warp Formation on GPU for Graphics Workloads

碩士 === 國立臺灣大學 === 資訊工程學研究所 === 99 === Graphics Processing Units (GPUs) used to process graphics computing only. Nowadays, the rise of parallel computing encourage different utility of GPU. General Purpose GPU computing exploits the large number of cores in GPU to parallelly accelerate complexity alg...

Full description

Bibliographic Details
Main Authors: Hsi-Feng Lin, 林希峰
Other Authors: Chia-Lin Yang
Format: Others
Language:en_US
Published: 2011
Online Access:http://ndltd.ncl.edu.tw/handle/92083255479150971798
Description
Summary:碩士 === 國立臺灣大學 === 資訊工程學研究所 === 99 === Graphics Processing Units (GPUs) used to process graphics computing only. Nowadays, the rise of parallel computing encourage different utility of GPU. General Purpose GPU computing exploits the large number of cores in GPU to parallelly accelerate complexity algorithmes. Many GPU researches recently focus on GPGPU architecture design or application acceleration by GPU. GPU uses Single Instruction Multiple Thread (SIMT) execution model to simplify the flow control mechanism, that can reduce the control area and increase core numbers for more computing ability. SIMT allows several threads group together as a thread group and execute single instructions in community. However, whenever a group encounters the branch instruction, threads in the group may need to distribute into different paths. When branch divergence, GPU commonly uses stack method, which lowers the computing utilization and decreases the performance. Therefore, Dynamic Warp Formation (DWF) mechanism was proposed by Fung et al. and was proved useful to solve this problem in some GPGPU cases. In this thesis, we try to find out if DWF is also useful for graphics workloads and then we analyze our exeperiments results. We also describe the difference of GPU architecture and GPGPU-sim architecture and some hardware design decision we made for DWF mechanism. Besides, we propose two observations in our experiments: First, DWF increases the opportunity of Write-Buffer-Full stall (TU stall), which may also increase the No-Ready-Warp stall (SP stall). Second, the relation between scheduling policies and branch types is not direct, but other factors like texture access need to be considered. We will narrowly describe our obeservations in experimental parts.