Capability-Aware Workload Partition on Multi-GPU Systems

碩士 === 國立交通大學 === 資訊科學與工程研究所 === 105 === Using multiple graphics processing units (GPUs) to accelerate applications has become more and more popular in recent years, with the assistance of multi-GPU abstraction techniques. However, an application that has only dependent kernels derives no benefit fr...

Full description

Bibliographic Details
Main Authors:	Chao, Yen-Ting, 趙硯廷
Other Authors:	You, Yi-Ping
Format:	Others
Language:	zh-TW
Published:	2016
Online Access:	http://ndltd.ncl.edu.tw/handle/62772638646338056219

id	ndltd-TW-105NCTU5394094
record_format	oai_dc
spelling	ndltd-TW-105NCTU53940942017-09-07T04:17:58Z http://ndltd.ncl.edu.tw/handle/62772638646338056219 Capability-Aware Workload Partition on Multi-GPU Systems 在多圖形處理器架構下考量裝置能力進行工作量分散運算 Chao, Yen-Ting 趙硯廷碩士國立交通大學資訊科學與工程研究所 105 Using multiple graphics processing units (GPUs) to accelerate applications has become more and more popular in recent years, with the assistance of multi-GPU abstraction techniques. However, an application that has only dependent kernels derives no benefit from the power of multiple GPUs since the kernels within the application cannot run simultaneously on those GPUs, thereby decreasing the utilization of GPUs. Applications that have a ‘big’ kernel, which launches a huge number of threads for processing massively parallel data, can also lower the overall throughput of a multi-GPU system. Such an application requires programmers to manually divide the kernel into several ‘small’ kernels and dispatch the kernels on different GPUs so as to utilize multiple GPU resources, but this imposes an extra burden on programmers. In this paper, we present XVirtCL, which is an extension of VirtCL (a GPU abstraction framework) for automatically balancing the workload of a kernel among multiple GPUs while considering the variety of compute capability levels of GPUs and minimizing the data transferred among GPUs. XVirtCL involves (1) a kernel analyzer for determining whether the workload of a kernel is suitable for being partitioned, (2) a workload scheduling algorithm for balancing workload of a kernel among multiple GPUs while considering the variety of compute capability levels of GPUs and (3) a workload partitioner for partitioning a kernel into multiple sub-kernels which have disjoint sub-NDrange spaces. The preliminary experimental results indicate that the proposed framework maximized the throughput of multiple GPUs for applications with big, regular kernels. You, Yi-Ping 游逸平 2016 學位論文 ; thesis 48 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立交通大學 === 資訊科學與工程研究所 === 105 === Using multiple graphics processing units (GPUs) to accelerate applications has become more and more popular in recent years, with the assistance of multi-GPU abstraction techniques. However, an application that has only dependent kernels derives no benefit from the power of multiple GPUs since the kernels within the application cannot run simultaneously on those GPUs, thereby decreasing the utilization of GPUs. Applications that have a ‘big’ kernel, which launches a huge number of threads for processing massively parallel data, can also lower the overall throughput of a multi-GPU system. Such an application requires programmers to manually divide the kernel into several ‘small’ kernels and dispatch the kernels on different GPUs so as to utilize multiple GPU resources, but this imposes an extra burden on programmers. In this paper, we present XVirtCL, which is an extension of VirtCL (a GPU abstraction framework) for automatically balancing the workload of a kernel among multiple GPUs while considering the variety of compute capability levels of GPUs and minimizing the data transferred among GPUs. XVirtCL involves (1) a kernel analyzer for determining whether the workload of a kernel is suitable for being partitioned, (2) a workload scheduling algorithm for balancing workload of a kernel among multiple GPUs while considering the variety of compute capability levels of GPUs and (3) a workload partitioner for partitioning a kernel into multiple sub-kernels which have disjoint sub-NDrange spaces. The preliminary experimental results indicate that the proposed framework maximized the throughput of multiple GPUs for applications with big, regular kernels.
author2	You, Yi-Ping
author_facet	You, Yi-Ping Chao, Yen-Ting 趙硯廷
author	Chao, Yen-Ting 趙硯廷
spellingShingle	Chao, Yen-Ting 趙硯廷 Capability-Aware Workload Partition on Multi-GPU Systems
author_sort	Chao, Yen-Ting
title	Capability-Aware Workload Partition on Multi-GPU Systems
title_short	Capability-Aware Workload Partition on Multi-GPU Systems
title_full	Capability-Aware Workload Partition on Multi-GPU Systems
title_fullStr	Capability-Aware Workload Partition on Multi-GPU Systems
title_full_unstemmed	Capability-Aware Workload Partition on Multi-GPU Systems
title_sort	capability-aware workload partition on multi-gpu systems
publishDate	2016
url	http://ndltd.ncl.edu.tw/handle/62772638646338056219
work_keys_str_mv	AT chaoyenting capabilityawareworkloadpartitiononmultigpusystems AT zhàoyàntíng capabilityawareworkloadpartitiononmultigpusystems AT chaoyenting zàiduōtúxíngchùlǐqìjiàgòuxiàkǎoliàngzhuāngzhìnénglìjìnxínggōngzuòliàngfēnsànyùnsuàn AT zhàoyàntíng zàiduōtúxíngchùlǐqìjiàgòuxiàkǎoliàngzhuāngzhìnénglìjìnxínggōngzuòliàngfēnsànyùnsuàn
_version_	1718527835045363712

Capability-Aware Workload Partition on Multi-GPU Systems

Similar Items