Capability-Aware Workload Partition on Multi-GPU Systems

碩士 === 國立交通大學 === 資訊科學與工程研究所 === 105 === Using multiple graphics processing units (GPUs) to accelerate applications has become more and more popular in recent years, with the assistance of multi-GPU abstraction techniques. However, an application that has only dependent kernels derives no benefit fr...

Full description

Bibliographic Details
Main Authors: Chao, Yen-Ting, 趙硯廷
Other Authors: You, Yi-Ping
Format: Others
Language:zh-TW
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/62772638646338056219
id ndltd-TW-105NCTU5394094
record_format oai_dc
spelling ndltd-TW-105NCTU53940942017-09-07T04:17:58Z http://ndltd.ncl.edu.tw/handle/62772638646338056219 Capability-Aware Workload Partition on Multi-GPU Systems 在多圖形處理器架構下考量裝置能力進行工作量分散運算 Chao, Yen-Ting 趙硯廷 碩士 國立交通大學 資訊科學與工程研究所 105 Using multiple graphics processing units (GPUs) to accelerate applications has become more and more popular in recent years, with the assistance of multi-GPU abstraction techniques. However, an application that has only dependent kernels derives no benefit from the power of multiple GPUs since the kernels within the application cannot run simultaneously on those GPUs, thereby decreasing the utilization of GPUs. Applications that have a ‘big’ kernel, which launches a huge number of threads for processing massively parallel data, can also lower the overall throughput of a multi-GPU system. Such an application requires programmers to manually divide the kernel into several ‘small’ kernels and dispatch the kernels on different GPUs so as to utilize multiple GPU resources, but this imposes an extra burden on programmers. In this paper, we present XVirtCL, which is an extension of VirtCL (a GPU abstraction framework) for automatically balancing the workload of a kernel among multiple GPUs while considering the variety of compute capability levels of GPUs and minimizing the data transferred among GPUs. XVirtCL involves (1) a kernel analyzer for determining whether the workload of a kernel is suitable for being partitioned, (2) a workload scheduling algorithm for balancing workload of a kernel among multiple GPUs while considering the variety of compute capability levels of GPUs and (3) a workload partitioner for partitioning a kernel into multiple sub-kernels which have disjoint sub-NDrange spaces. The preliminary experimental results indicate that the proposed framework maximized the throughput of multiple GPUs for applications with big, regular kernels. You, Yi-Ping 游逸平 2016 學位論文 ; thesis 48 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立交通大學 === 資訊科學與工程研究所 === 105 === Using multiple graphics processing units (GPUs) to accelerate applications has become more and more popular in recent years, with the assistance of multi-GPU abstraction techniques. However, an application that has only dependent kernels derives no benefit from the power of multiple GPUs since the kernels within the application cannot run simultaneously on those GPUs, thereby decreasing the utilization of GPUs. Applications that have a ‘big’ kernel, which launches a huge number of threads for processing massively parallel data, can also lower the overall throughput of a multi-GPU system. Such an application requires programmers to manually divide the kernel into several ‘small’ kernels and dispatch the kernels on different GPUs so as to utilize multiple GPU resources, but this imposes an extra burden on programmers. In this paper, we present XVirtCL, which is an extension of VirtCL (a GPU abstraction framework) for automatically balancing the workload of a kernel among multiple GPUs while considering the variety of compute capability levels of GPUs and minimizing the data transferred among GPUs. XVirtCL involves (1) a kernel analyzer for determining whether the workload of a kernel is suitable for being partitioned, (2) a workload scheduling algorithm for balancing workload of a kernel among multiple GPUs while considering the variety of compute capability levels of GPUs and (3) a workload partitioner for partitioning a kernel into multiple sub-kernels which have disjoint sub-NDrange spaces. The preliminary experimental results indicate that the proposed framework maximized the throughput of multiple GPUs for applications with big, regular kernels.
author2 You, Yi-Ping
author_facet You, Yi-Ping
Chao, Yen-Ting
趙硯廷
author Chao, Yen-Ting
趙硯廷
spellingShingle Chao, Yen-Ting
趙硯廷
Capability-Aware Workload Partition on Multi-GPU Systems
author_sort Chao, Yen-Ting
title Capability-Aware Workload Partition on Multi-GPU Systems
title_short Capability-Aware Workload Partition on Multi-GPU Systems
title_full Capability-Aware Workload Partition on Multi-GPU Systems
title_fullStr Capability-Aware Workload Partition on Multi-GPU Systems
title_full_unstemmed Capability-Aware Workload Partition on Multi-GPU Systems
title_sort capability-aware workload partition on multi-gpu systems
publishDate 2016
url http://ndltd.ncl.edu.tw/handle/62772638646338056219
work_keys_str_mv AT chaoyenting capabilityawareworkloadpartitiononmultigpusystems
AT zhàoyàntíng capabilityawareworkloadpartitiononmultigpusystems
AT chaoyenting zàiduōtúxíngchùlǐqìjiàgòuxiàkǎoliàngzhuāngzhìnénglìjìnxínggōngzuòliàngfēnsànyùnsuàn
AT zhàoyàntíng zàiduōtúxíngchùlǐqìjiàgòuxiàkǎoliàngzhuāngzhìnénglìjìnxínggōngzuòliàngfēnsànyùnsuàn
_version_ 1718527835045363712