Capability-Aware Workload Partition on Multi-GPU Systems
碩士 === 國立交通大學 === 資訊科學與工程研究所 === 105 === Using multiple graphics processing units (GPUs) to accelerate applications has become more and more popular in recent years, with the assistance of multi-GPU abstraction techniques. However, an application that has only dependent kernels derives no benefit fr...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2016
|
Online Access: | http://ndltd.ncl.edu.tw/handle/62772638646338056219 |
id |
ndltd-TW-105NCTU5394094 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-105NCTU53940942017-09-07T04:17:58Z http://ndltd.ncl.edu.tw/handle/62772638646338056219 Capability-Aware Workload Partition on Multi-GPU Systems 在多圖形處理器架構下考量裝置能力進行工作量分散運算 Chao, Yen-Ting 趙硯廷 碩士 國立交通大學 資訊科學與工程研究所 105 Using multiple graphics processing units (GPUs) to accelerate applications has become more and more popular in recent years, with the assistance of multi-GPU abstraction techniques. However, an application that has only dependent kernels derives no benefit from the power of multiple GPUs since the kernels within the application cannot run simultaneously on those GPUs, thereby decreasing the utilization of GPUs. Applications that have a ‘big’ kernel, which launches a huge number of threads for processing massively parallel data, can also lower the overall throughput of a multi-GPU system. Such an application requires programmers to manually divide the kernel into several ‘small’ kernels and dispatch the kernels on different GPUs so as to utilize multiple GPU resources, but this imposes an extra burden on programmers. In this paper, we present XVirtCL, which is an extension of VirtCL (a GPU abstraction framework) for automatically balancing the workload of a kernel among multiple GPUs while considering the variety of compute capability levels of GPUs and minimizing the data transferred among GPUs. XVirtCL involves (1) a kernel analyzer for determining whether the workload of a kernel is suitable for being partitioned, (2) a workload scheduling algorithm for balancing workload of a kernel among multiple GPUs while considering the variety of compute capability levels of GPUs and (3) a workload partitioner for partitioning a kernel into multiple sub-kernels which have disjoint sub-NDrange spaces. The preliminary experimental results indicate that the proposed framework maximized the throughput of multiple GPUs for applications with big, regular kernels. You, Yi-Ping 游逸平 2016 學位論文 ; thesis 48 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立交通大學 === 資訊科學與工程研究所 === 105 === Using multiple graphics processing units (GPUs) to accelerate applications has become more
and more popular in recent years, with the assistance of multi-GPU abstraction techniques.
However, an application that has only dependent kernels derives no benefit from the power
of multiple GPUs since the kernels within the application cannot run simultaneously on those
GPUs, thereby decreasing the utilization of GPUs. Applications that have a ‘big’ kernel, which
launches a huge number of threads for processing massively parallel data, can also lower the
overall throughput of a multi-GPU system. Such an application requires programmers to manually
divide the kernel into several ‘small’ kernels and dispatch the kernels on different GPUs so
as to utilize multiple GPU resources, but this imposes an extra burden on programmers. In this
paper, we present XVirtCL, which is an extension of VirtCL (a GPU abstraction framework) for
automatically balancing the workload of a kernel among multiple GPUs while considering the
variety of compute capability levels of GPUs and minimizing the data transferred among GPUs.
XVirtCL involves (1) a kernel analyzer for determining whether the workload of a kernel is
suitable for being partitioned, (2) a workload scheduling algorithm for balancing workload of
a kernel among multiple GPUs while considering the variety of compute capability levels of
GPUs and (3) a workload partitioner for partitioning a kernel into multiple sub-kernels which
have disjoint sub-NDrange spaces. The preliminary experimental results indicate that the proposed
framework maximized the throughput of multiple GPUs for applications with big, regular
kernels.
|
author2 |
You, Yi-Ping |
author_facet |
You, Yi-Ping Chao, Yen-Ting 趙硯廷 |
author |
Chao, Yen-Ting 趙硯廷 |
spellingShingle |
Chao, Yen-Ting 趙硯廷 Capability-Aware Workload Partition on Multi-GPU Systems |
author_sort |
Chao, Yen-Ting |
title |
Capability-Aware Workload Partition on Multi-GPU Systems |
title_short |
Capability-Aware Workload Partition on Multi-GPU Systems |
title_full |
Capability-Aware Workload Partition on Multi-GPU Systems |
title_fullStr |
Capability-Aware Workload Partition on Multi-GPU Systems |
title_full_unstemmed |
Capability-Aware Workload Partition on Multi-GPU Systems |
title_sort |
capability-aware workload partition on multi-gpu systems |
publishDate |
2016 |
url |
http://ndltd.ncl.edu.tw/handle/62772638646338056219 |
work_keys_str_mv |
AT chaoyenting capabilityawareworkloadpartitiononmultigpusystems AT zhàoyàntíng capabilityawareworkloadpartitiononmultigpusystems AT chaoyenting zàiduōtúxíngchùlǐqìjiàgòuxiàkǎoliàngzhuāngzhìnénglìjìnxínggōngzuòliàngfēnsànyùnsuàn AT zhàoyàntíng zàiduōtúxíngchùlǐqìjiàgòuxiàkǎoliàngzhuāngzhìnénglìjìnxínggōngzuòliàngfēnsànyùnsuàn |
_version_ |
1718527835045363712 |