Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms

碩士 === 國立交通大學 === 資訊科學與工程研究所 === 105 === The interest in using multiple graphics processing units (GPUs) to accelerate applications has increased in recent years. Several abstraction techniques had been proposed to ease the work, such as device selection and data transfer among multiple devices, for...

Full description

Bibliographic Details
Main Authors:	Tsai, Yeh-Ning, 蔡也寜
Other Authors:	Yi, Ping-You
Format:	Others
Language:	zh-TW
Published:	2016
Online Access:	http://ndltd.ncl.edu.tw/handle/75169969401118624835

id	ndltd-TW-105NCTU5394093
record_format	oai_dc
spelling	ndltd-TW-105NCTU53940932017-09-07T04:17:58Z http://ndltd.ncl.edu.tw/handle/75169969401118624835 Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms 利用機器學習方法改進在多圖形處理裝置平台上基於歷史資訊的工作排程方法 Tsai, Yeh-Ning 蔡也寜碩士國立交通大學資訊科學與工程研究所 105 The interest in using multiple graphics processing units (GPUs) to accelerate applications has increased in recent years. Several abstraction techniques had been proposed to ease the work, such as device selection and data transfer among multiple devices, for programmers to utilize the computing power of multiple GPUs. One well-designed framework, called VirtCL, implements a run-time system that provides a high-level abstraction of OpenCL devices so as to reduce the programming burden by acting as a layer between the programmer and the native OpenCL run-time system. The layer abstracts multiple devices into a single virtual device and schedules computations and communications among the multiple devices. VirtCL implements a history-based scheduler that schedules kernel tasks in a contention- and communication-aware manner. However, the scheduler have two problems: (1) VirtCL assumes that all the underlying GPU devices have the same compute capability and (2) VirtCL assumes that there exists a linear relationship between the execution time of a kernel and the input data size of the kernel. In fact, the execution time of a kernel is influenced by not only the input data size but also the characteristics of the kernel. These two assumptions might result in imbalance schedules, especially when the compute capabilities of the underlying devices vary. Therefore, in this paper, we propose a method for predicting the execution time of a kernel based on a machine learning model, which takes the characteristics of a kernel and the compute capability of underlying devices into consideration. The model construction consists of two phases: (1) clustering and (2) classification. In the phase of clustering, training kernel datasets are clustered to form groups of kernels with similar performance scaling behavior across different GPU devices. In the phase of classification, a classifier is built to map the features of a kernel to a cluster. Once the model is built, it will be used at runtime as a prediction model which takes the features of a kernel as inputs and outputs the predicted scaling behavior for the kernel; this information will be combined with the execution history of the kernel to predict the execution time of the kernel. With the more accurate execution time prediction, the scheduler in VirtCL can thus make a better decision on selecting a device for a kernel on multi-GPU platforms. The preliminary experimental results indicates that the proposed prediction model had an average of 31.5% prediction error on the execution times of kernels, and with the more accurate prediction, the overall throughput was increased by an average of 24% for synthetic workload traces. Yi, Ping-You 游逸平 2016 學位論文 ; thesis 47 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立交通大學 === 資訊科學與工程研究所 === 105 === The interest in using multiple graphics processing units (GPUs) to accelerate applications has increased in recent years. Several abstraction techniques had been proposed to ease the work, such as device selection and data transfer among multiple devices, for programmers to utilize the computing power of multiple GPUs. One well-designed framework, called VirtCL, implements a run-time system that provides a high-level abstraction of OpenCL devices so as to reduce the programming burden by acting as a layer between the programmer and the native OpenCL run-time system. The layer abstracts multiple devices into a single virtual device and schedules computations and communications among the multiple devices. VirtCL implements a history-based scheduler that schedules kernel tasks in a contention- and communication-aware manner. However, the scheduler have two problems: (1) VirtCL assumes that all the underlying GPU devices have the same compute capability and (2) VirtCL assumes that there exists a linear relationship between the execution time of a kernel and the input data size of the kernel. In fact, the execution time of a kernel is influenced by not only the input data size but also the characteristics of the kernel. These two assumptions might result in imbalance schedules, especially when the compute capabilities of the underlying devices vary. Therefore, in this paper, we propose a method for predicting the execution time of a kernel based on a machine learning model, which takes the characteristics of a kernel and the compute capability of underlying devices into consideration. The model construction consists of two phases: (1) clustering and (2) classification. In the phase of clustering, training kernel datasets are clustered to form groups of kernels with similar performance scaling behavior across different GPU devices. In the phase of classification, a classifier is built to map the features of a kernel to a cluster. Once the model is built, it will be used at runtime as a prediction model which takes the features of a kernel as inputs and outputs the predicted scaling behavior for the kernel; this information will be combined with the execution history of the kernel to predict the execution time of the kernel. With the more accurate execution time prediction, the scheduler in VirtCL can thus make a better decision on selecting a device for a kernel on multi-GPU platforms. The preliminary experimental results indicates that the proposed prediction model had an average of 31.5% prediction error on the execution times of kernels, and with the more accurate prediction, the overall throughput was increased by an average of 24% for synthetic workload traces.
author2	Yi, Ping-You
author_facet	Yi, Ping-You Tsai, Yeh-Ning 蔡也寜
author	Tsai, Yeh-Ning 蔡也寜
spellingShingle	Tsai, Yeh-Ning 蔡也寜 Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms
author_sort	Tsai, Yeh-Ning
title	Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms
title_short	Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms
title_full	Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms
title_fullStr	Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms
title_full_unstemmed	Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms
title_sort	improving history-based task scheduling using machine learning method on multiple-gpu platforms
publishDate	2016
url	http://ndltd.ncl.edu.tw/handle/75169969401118624835
work_keys_str_mv	AT tsaiyehning improvinghistorybasedtaskschedulingusingmachinelearningmethodonmultiplegpuplatforms AT càiyěníng improvinghistorybasedtaskschedulingusingmachinelearningmethodonmultiplegpuplatforms AT tsaiyehning lìyòngjīqìxuéxífāngfǎgǎijìnzàiduōtúxíngchùlǐzhuāngzhìpíngtáishàngjīyúlìshǐzīxùndegōngzuòpáichéngfāngfǎ AT càiyěníng lìyòngjīqìxuéxífāngfǎgǎijìnzàiduōtúxíngchùlǐzhuāngzhìpíngtáishàngjīyúlìshǐzīxùndegōngzuòpáichéngfāngfǎ
_version_	1718527834463404032

Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms

Similar Items