Summary: | 碩士 === 國立交通大學 === 資訊科學與工程研究所 === 105 === Using multiple graphics processing units (GPUs) to accelerate applications has become more
and more popular in recent years, with the assistance of multi-GPU abstraction techniques.
However, an application that has only dependent kernels derives no benefit from the power
of multiple GPUs since the kernels within the application cannot run simultaneously on those
GPUs, thereby decreasing the utilization of GPUs. Applications that have a ‘big’ kernel, which
launches a huge number of threads for processing massively parallel data, can also lower the
overall throughput of a multi-GPU system. Such an application requires programmers to manually
divide the kernel into several ‘small’ kernels and dispatch the kernels on different GPUs so
as to utilize multiple GPU resources, but this imposes an extra burden on programmers. In this
paper, we present XVirtCL, which is an extension of VirtCL (a GPU abstraction framework) for
automatically balancing the workload of a kernel among multiple GPUs while considering the
variety of compute capability levels of GPUs and minimizing the data transferred among GPUs.
XVirtCL involves (1) a kernel analyzer for determining whether the workload of a kernel is
suitable for being partitioned, (2) a workload scheduling algorithm for balancing workload of
a kernel among multiple GPUs while considering the variety of compute capability levels of
GPUs and (3) a workload partitioner for partitioning a kernel into multiple sub-kernels which
have disjoint sub-NDrange spaces. The preliminary experimental results indicate that the proposed
framework maximized the throughput of multiple GPUs for applications with big, regular
kernels.
|