Capability-Aware Workload Partition on Multi-GPU Systems

碩士 === 國立交通大學 === 資訊科學與工程研究所 === 105 === Using multiple graphics processing units (GPUs) to accelerate applications has become more and more popular in recent years, with the assistance of multi-GPU abstraction techniques. However, an application that has only dependent kernels derives no benefit fr...

Full description

Bibliographic Details
Main Authors:	Chao, Yen-Ting, 趙硯廷
Other Authors:	You, Yi-Ping
Format:	Others
Language:	zh-TW
Published:	2016
Online Access:	http://ndltd.ncl.edu.tw/handle/62772638646338056219

Description
Summary:	碩士 === 國立交通大學 === 資訊科學與工程研究所 === 105 === Using multiple graphics processing units (GPUs) to accelerate applications has become more and more popular in recent years, with the assistance of multi-GPU abstraction techniques. However, an application that has only dependent kernels derives no benefit from the power of multiple GPUs since the kernels within the application cannot run simultaneously on those GPUs, thereby decreasing the utilization of GPUs. Applications that have a ‘big’ kernel, which launches a huge number of threads for processing massively parallel data, can also lower the overall throughput of a multi-GPU system. Such an application requires programmers to manually divide the kernel into several ‘small’ kernels and dispatch the kernels on different GPUs so as to utilize multiple GPU resources, but this imposes an extra burden on programmers. In this paper, we present XVirtCL, which is an extension of VirtCL (a GPU abstraction framework) for automatically balancing the workload of a kernel among multiple GPUs while considering the variety of compute capability levels of GPUs and minimizing the data transferred among GPUs. XVirtCL involves (1) a kernel analyzer for determining whether the workload of a kernel is suitable for being partitioned, (2) a workload scheduling algorithm for balancing workload of a kernel among multiple GPUs while considering the variety of compute capability levels of GPUs and (3) a workload partitioner for partitioning a kernel into multiple sub-kernels which have disjoint sub-NDrange spaces. The preliminary experimental results indicate that the proposed framework maximized the throughput of multiple GPUs for applications with big, regular kernels.

Capability-Aware Workload Partition on Multi-GPU Systems

Similar Items