Scheduling Algorithms of Co-optimizing Thread-Level- Parallelism and Cache Utilization for GPGPUs 研

碩士 === 國立交通大學 === 電子工程學系 電子研究所 === 102 === Thread-Level-Parallelism (TLP) and cache utilization are two significant performance factors of modern throughput processors. The conflicting correlation between the two factors has made the design a non-trivial task. Increasing TLP would aggravate cache co...

Full description

Bibliographic Details
Main Authors: Lu, Chin-Fu, 呂勁甫
Other Authors: Jou, Jing-Yang
Format: Others
Language:en_US
Published: 2014
Online Access:http://ndltd.ncl.edu.tw/handle/99321023691038445807
Description
Summary:碩士 === 國立交通大學 === 電子工程學系 電子研究所 === 102 === Thread-Level-Parallelism (TLP) and cache utilization are two significant performance factors of modern throughput processors. The conflicting correlation between the two factors has made the design a non-trivial task. Increasing TLP would aggravate cache contention, while avoiding cache contention could limit the TLP. The trade-off becomes even more intrigue and sensitive when dealing with applications with irregular data access patterns. Many existing thread scheduling algorithms addresses only one of these factors at a time. This thesis has demonstrated that there exists a significant performance gain when the two factors are considered together and properly traded-off. To conduct a comprehensive analysis for the performance impact of the two factors, this thesis formulates two thread scheduling problem to characterize the design concerns. A series of solutions are integrated to resolve the scheduling on a set of applications with irregular memory accesses. The experiment results on NVIDIA’s Fermi architecture have shown the performance difference of the proposed thread scheduling addressing various combination of constrains. Compare to a widely-used thread scheduling schemes, the average improvement on execution time can reach up to 51%.