Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms

碩士 === 國立交通大學 === 資訊科學與工程研究所 === 105 === The interest in using multiple graphics processing units (GPUs) to accelerate applications has increased in recent years. Several abstraction techniques had been proposed to ease the work, such as device selection and data transfer among multiple devices, for...

Full description

Bibliographic Details
Main Authors: Tsai, Yeh-Ning, 蔡也寜
Other Authors: Yi, Ping-You
Format: Others
Language:zh-TW
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/75169969401118624835
id ndltd-TW-105NCTU5394093
record_format oai_dc
spelling ndltd-TW-105NCTU53940932017-09-07T04:17:58Z http://ndltd.ncl.edu.tw/handle/75169969401118624835 Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms 利用機器學習方法改進在多圖形處理裝置平台上 基於歷史資訊的工作排程方法 Tsai, Yeh-Ning 蔡也寜 碩士 國立交通大學 資訊科學與工程研究所 105 The interest in using multiple graphics processing units (GPUs) to accelerate applications has increased in recent years. Several abstraction techniques had been proposed to ease the work, such as device selection and data transfer among multiple devices, for programmers to utilize the computing power of multiple GPUs. One well-designed framework, called VirtCL, implements a run-time system that provides a high-level abstraction of OpenCL devices so as to reduce the programming burden by acting as a layer between the programmer and the native OpenCL run-time system. The layer abstracts multiple devices into a single virtual device and schedules computations and communications among the multiple devices. VirtCL implements a history-based scheduler that schedules kernel tasks in a contention- and communication-aware manner. However, the scheduler have two problems: (1) VirtCL assumes that all the underlying GPU devices have the same compute capability and (2) VirtCL assumes that there exists a linear relationship between the execution time of a kernel and the input data size of the kernel. In fact, the execution time of a kernel is influenced by not only the input data size but also the characteristics of the kernel. These two assumptions might result in imbalance schedules, especially when the compute capabilities of the underlying devices vary. Therefore, in this paper, we propose a method for predicting the execution time of a kernel based on a machine learning model, which takes the characteristics of a kernel and the compute capability of underlying devices into consideration. The model construction consists of two phases: (1) clustering and (2) classification. In the phase of clustering, training kernel datasets are clustered to form groups of kernels with similar performance scaling behavior across different GPU devices. In the phase of classification, a classifier is built to map the features of a kernel to a cluster. Once the model is built, it will be used at runtime as a prediction model which takes the features of a kernel as inputs and outputs the predicted scaling behavior for the kernel; this information will be combined with the execution history of the kernel to predict the execution time of the kernel. With the more accurate execution time prediction, the scheduler in VirtCL can thus make a better decision on selecting a device for a kernel on multi-GPU platforms. The preliminary experimental results indicates that the proposed prediction model had an average of 31.5% prediction error on the execution times of kernels, and with the more accurate prediction, the overall throughput was increased by an average of 24% for synthetic workload traces. Yi, Ping-You 游逸平 2016 學位論文 ; thesis 47 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立交通大學 === 資訊科學與工程研究所 === 105 === The interest in using multiple graphics processing units (GPUs) to accelerate applications has increased in recent years. Several abstraction techniques had been proposed to ease the work, such as device selection and data transfer among multiple devices, for programmers to utilize the computing power of multiple GPUs. One well-designed framework, called VirtCL, implements a run-time system that provides a high-level abstraction of OpenCL devices so as to reduce the programming burden by acting as a layer between the programmer and the native OpenCL run-time system. The layer abstracts multiple devices into a single virtual device and schedules computations and communications among the multiple devices. VirtCL implements a history-based scheduler that schedules kernel tasks in a contention- and communication-aware manner. However, the scheduler have two problems: (1) VirtCL assumes that all the underlying GPU devices have the same compute capability and (2) VirtCL assumes that there exists a linear relationship between the execution time of a kernel and the input data size of the kernel. In fact, the execution time of a kernel is influenced by not only the input data size but also the characteristics of the kernel. These two assumptions might result in imbalance schedules, especially when the compute capabilities of the underlying devices vary. Therefore, in this paper, we propose a method for predicting the execution time of a kernel based on a machine learning model, which takes the characteristics of a kernel and the compute capability of underlying devices into consideration. The model construction consists of two phases: (1) clustering and (2) classification. In the phase of clustering, training kernel datasets are clustered to form groups of kernels with similar performance scaling behavior across different GPU devices. In the phase of classification, a classifier is built to map the features of a kernel to a cluster. Once the model is built, it will be used at runtime as a prediction model which takes the features of a kernel as inputs and outputs the predicted scaling behavior for the kernel; this information will be combined with the execution history of the kernel to predict the execution time of the kernel. With the more accurate execution time prediction, the scheduler in VirtCL can thus make a better decision on selecting a device for a kernel on multi-GPU platforms. The preliminary experimental results indicates that the proposed prediction model had an average of 31.5% prediction error on the execution times of kernels, and with the more accurate prediction, the overall throughput was increased by an average of 24% for synthetic workload traces.
author2 Yi, Ping-You
author_facet Yi, Ping-You
Tsai, Yeh-Ning
蔡也寜
author Tsai, Yeh-Ning
蔡也寜
spellingShingle Tsai, Yeh-Ning
蔡也寜
Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms
author_sort Tsai, Yeh-Ning
title Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms
title_short Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms
title_full Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms
title_fullStr Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms
title_full_unstemmed Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms
title_sort improving history-based task scheduling using machine learning method on multiple-gpu platforms
publishDate 2016
url http://ndltd.ncl.edu.tw/handle/75169969401118624835
work_keys_str_mv AT tsaiyehning improvinghistorybasedtaskschedulingusingmachinelearningmethodonmultiplegpuplatforms
AT càiyěníng improvinghistorybasedtaskschedulingusingmachinelearningmethodonmultiplegpuplatforms
AT tsaiyehning lìyòngjīqìxuéxífāngfǎgǎijìnzàiduōtúxíngchùlǐzhuāngzhìpíngtáishàngjīyúlìshǐzīxùndegōngzuòpáichéngfāngfǎ
AT càiyěníng lìyòngjīqìxuéxífāngfǎgǎijìnzàiduōtúxíngchùlǐzhuāngzhìpíngtáishàngjīyúlìshǐzīxùndegōngzuòpáichéngfāngfǎ
_version_ 1718527834463404032