Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms
碩士 === 國立交通大學 === 資訊科學與工程研究所 === 105 === The interest in using multiple graphics processing units (GPUs) to accelerate applications has increased in recent years. Several abstraction techniques had been proposed to ease the work, such as device selection and data transfer among multiple devices, for...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2016
|
Online Access: | http://ndltd.ncl.edu.tw/handle/75169969401118624835 |
id |
ndltd-TW-105NCTU5394093 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-105NCTU53940932017-09-07T04:17:58Z http://ndltd.ncl.edu.tw/handle/75169969401118624835 Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms 利用機器學習方法改進在多圖形處理裝置平台上 基於歷史資訊的工作排程方法 Tsai, Yeh-Ning 蔡也寜 碩士 國立交通大學 資訊科學與工程研究所 105 The interest in using multiple graphics processing units (GPUs) to accelerate applications has increased in recent years. Several abstraction techniques had been proposed to ease the work, such as device selection and data transfer among multiple devices, for programmers to utilize the computing power of multiple GPUs. One well-designed framework, called VirtCL, implements a run-time system that provides a high-level abstraction of OpenCL devices so as to reduce the programming burden by acting as a layer between the programmer and the native OpenCL run-time system. The layer abstracts multiple devices into a single virtual device and schedules computations and communications among the multiple devices. VirtCL implements a history-based scheduler that schedules kernel tasks in a contention- and communication-aware manner. However, the scheduler have two problems: (1) VirtCL assumes that all the underlying GPU devices have the same compute capability and (2) VirtCL assumes that there exists a linear relationship between the execution time of a kernel and the input data size of the kernel. In fact, the execution time of a kernel is influenced by not only the input data size but also the characteristics of the kernel. These two assumptions might result in imbalance schedules, especially when the compute capabilities of the underlying devices vary. Therefore, in this paper, we propose a method for predicting the execution time of a kernel based on a machine learning model, which takes the characteristics of a kernel and the compute capability of underlying devices into consideration. The model construction consists of two phases: (1) clustering and (2) classification. In the phase of clustering, training kernel datasets are clustered to form groups of kernels with similar performance scaling behavior across different GPU devices. In the phase of classification, a classifier is built to map the features of a kernel to a cluster. Once the model is built, it will be used at runtime as a prediction model which takes the features of a kernel as inputs and outputs the predicted scaling behavior for the kernel; this information will be combined with the execution history of the kernel to predict the execution time of the kernel. With the more accurate execution time prediction, the scheduler in VirtCL can thus make a better decision on selecting a device for a kernel on multi-GPU platforms. The preliminary experimental results indicates that the proposed prediction model had an average of 31.5% prediction error on the execution times of kernels, and with the more accurate prediction, the overall throughput was increased by an average of 24% for synthetic workload traces. Yi, Ping-You 游逸平 2016 學位論文 ; thesis 47 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立交通大學 === 資訊科學與工程研究所 === 105 === The interest in using multiple graphics processing units (GPUs) to accelerate applications has increased in recent years. Several abstraction techniques had been proposed to ease the work, such as device selection and data transfer among multiple devices, for programmers to utilize the computing power of multiple GPUs. One well-designed framework, called VirtCL, implements a run-time system that provides a high-level abstraction of OpenCL devices so as to reduce the programming burden by acting as a layer between the programmer and the native OpenCL run-time system. The layer abstracts multiple devices into a single virtual device and schedules computations and communications among the multiple devices. VirtCL implements a history-based scheduler that schedules kernel tasks in a contention- and communication-aware manner. However, the scheduler have two problems: (1) VirtCL assumes that all the underlying GPU devices have the same compute capability and (2) VirtCL assumes that there exists a linear relationship between the execution time of a kernel and the input data size of the kernel. In fact, the execution time of a kernel is influenced by not only the input data size but also the characteristics of the kernel. These two assumptions might result in imbalance schedules, especially when the compute capabilities of the underlying devices vary. Therefore, in this paper, we propose a method for predicting the execution time of a kernel based on a machine learning model, which takes the characteristics of a kernel and the compute capability of underlying devices into consideration. The model construction consists of two phases: (1) clustering and (2) classification. In the phase of clustering, training kernel datasets are clustered to form groups of kernels with similar performance scaling behavior across different GPU devices. In the phase of classification, a classifier is built to map the features of a kernel to a cluster. Once the model is built, it will be used at runtime as a prediction model which takes the features of a kernel as inputs and outputs the predicted scaling behavior for the kernel; this information will be combined with the execution history of the kernel to predict the execution time of the kernel. With the more accurate execution time prediction, the scheduler in VirtCL can thus make a better decision on selecting a device for a kernel on multi-GPU platforms. The preliminary experimental results indicates that the proposed prediction model had an average of 31.5% prediction error on the execution times of kernels, and with the more accurate prediction, the overall throughput was increased by an average of 24% for synthetic workload traces.
|
author2 |
Yi, Ping-You |
author_facet |
Yi, Ping-You Tsai, Yeh-Ning 蔡也寜 |
author |
Tsai, Yeh-Ning 蔡也寜 |
spellingShingle |
Tsai, Yeh-Ning 蔡也寜 Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms |
author_sort |
Tsai, Yeh-Ning |
title |
Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms |
title_short |
Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms |
title_full |
Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms |
title_fullStr |
Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms |
title_full_unstemmed |
Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms |
title_sort |
improving history-based task scheduling using machine learning method on multiple-gpu platforms |
publishDate |
2016 |
url |
http://ndltd.ncl.edu.tw/handle/75169969401118624835 |
work_keys_str_mv |
AT tsaiyehning improvinghistorybasedtaskschedulingusingmachinelearningmethodonmultiplegpuplatforms AT càiyěníng improvinghistorybasedtaskschedulingusingmachinelearningmethodonmultiplegpuplatforms AT tsaiyehning lìyòngjīqìxuéxífāngfǎgǎijìnzàiduōtúxíngchùlǐzhuāngzhìpíngtáishàngjīyúlìshǐzīxùndegōngzuòpáichéngfāngfǎ AT càiyěníng lìyòngjīqìxuéxífāngfǎgǎijìnzàiduōtúxíngchùlǐzhuāngzhìpíngtáishàngjīyúlìshǐzīxùndegōngzuòpáichéngfāngfǎ |
_version_ |
1718527834463404032 |