VirtCL: A Framework for OpenCL Device Abstraction and Management

碩士 === 國立交通大學 === 資訊科學與工程研究所 === 102 === Using multiple GPU devices to accelerate applications has become a growing area of interest in recent years. However, the existing heterogeneous programming models, such as OpenCL, abstract details of GPU devices at per device level and require programmers to...

Full description

Bibliographic Details
Main Authors: Wu, Han-Jung, 吳翰融
Other Authors: You, Yi-Ping
Format: Others
Language:en_US
Published: 2013
Online Access:http://ndltd.ncl.edu.tw/handle/18253720817653292197
Description
Summary:碩士 === 國立交通大學 === 資訊科學與工程研究所 === 102 === Using multiple GPU devices to accelerate applications has become a growing area of interest in recent years. However, the existing heterogeneous programming models, such as OpenCL, abstract details of GPU devices at per device level and require programmers to explicitly schedule their kernel tasks on a system equipped with multiple GPU devices. Unfortunately, in the case of multiple applications running on a multi-GPU system, applications may compete for certain GPU device(s), say the first device, while some other GPU devices are left unused. Moreover, the distributed memory model (each device having its own memory space) defined in OpenCL complexes the memory management among multiple GPU devices. In this thesis, we propose a framework (called VirtCL), which acts as a layer between programmers and the native OpenCL runtime system for abstracting multiple devices into a single virtual device and scheduling computations and communications among the multiple devices, thereby alleviating programmers' burden. VirtCL comprises two main components: a front-end library, which exposes primary OpenCL APIs and the virtual device, and a back-end runtime system (called CLDaemon) for scheduling and dispatching kernels based on a history-based kernel scheduler. The front-end library forwards computation requests to the back-end CLDaemon, and CLDaemon then schedules and dispatches the requests. We also propose a history-based scheduler COST which is able to schedule kernels in a contention- and data-aware fashion. The experimental results show that the VirtCL framework outperformed the native OpenCL runtime system for most benchmarks in the Rodinia benchmark suite since the abstraction layer eliminated the heavy-weight initialization of OpenCL contexts. The overhead analysis shows that the framework has small overhead (10.44\% on average). The throughput of the proposed framework is measured under various kernel scheduling policies with real-world application clsurf and trace-based simulation.The result shows that the proposed scheduler beat native OpenCL and other schedulers when system load is very large, our proposed also enabled scalability for applications running on multi-GPU systems.