Scheduling Optimization of Backpropagation for Deep Learning on GPU

碩士 === 國立臺灣大學 === 資訊工程學研究所 === 106 === Many large deep neural network models have been proposed in recent years to achieve more accurate training results. The training of these large models requires a huge amount of memory and communication, which becomes a challenging issue in improving the perform...

Full description

Bibliographic Details
Main Authors:	Cing-Fu Jhu, 朱清福
Other Authors:	Pangfeng Liu
Format:	Others
Language:	en_US
Published:	2018
Online Access:	http://ndltd.ncl.edu.tw/handle/p6fa2n

id	ndltd-TW-106NTU05392091
record_format	oai_dc
spelling	ndltd-TW-106NTU053920912019-07-25T04:46:48Z http://ndltd.ncl.edu.tw/handle/p6fa2n Scheduling Optimization of Backpropagation for Deep Learning on GPU 針對圖形處理器上的深度學習反向傳播計算之排程優化 Cing-Fu Jhu 朱清福碩士國立臺灣大學資訊工程學研究所 106 Many large deep neural network models have been proposed in recent years to achieve more accurate training results. The training of these large models requires a huge amount of memory and communication, which becomes a challenging issue in improving the performance of deep learning. In this paper, we analyze the data access pattern of training a deep neural network and propose a data pinning algorithm that reduces the data usage on GPU and the movement between a GPU and its CPU host. We show that to find an optimal data movement scheduling is NP-complete,and propose a dynamic programming that can find the optimal solution in pseudo polynomial time. That is, we observe the access pattern of the training of the deep neural network and propose specialized GPU data pinning algorithm that minimizes the unnecessary data movements. We then implement our dynamic programming on to train real deep learning models. The experiments show that we can pin up to 20% more data into GPU memory than GeePS, a state of art deep learning framework. We also propose memory reduction technique for back-propagation in deep learning. We analyzed the access pattern of back propagation in deep learning and realized that gradient computation and weight update, two transitionally sequentially done major steps, can be partially overlapped. In addition, we analyzed the semantics of the computation and realized that by delaying weight update we can avoid double buffering due to read/write conflicts in traditional naive parallel implementation. We then implement our techniques and observe up to 75% reduction in memory usage. Pangfeng Liu 劉邦鋒 2018 學位論文 ; thesis 29 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立臺灣大學 === 資訊工程學研究所 === 106 === Many large deep neural network models have been proposed in recent years to achieve more accurate training results. The training of these large models requires a huge amount of memory and communication, which becomes a challenging issue in improving the performance of deep learning. In this paper, we analyze the data access pattern of training a deep neural network and propose a data pinning algorithm that reduces the data usage on GPU and the movement between a GPU and its CPU host. We show that to find an optimal data movement scheduling is NP-complete,and propose a dynamic programming that can find the optimal solution in pseudo polynomial time. That is, we observe the access pattern of the training of the deep neural network and propose specialized GPU data pinning algorithm that minimizes the unnecessary data movements. We then implement our dynamic programming on to train real deep learning models. The experiments show that we can pin up to 20% more data into GPU memory than GeePS, a state of art deep learning framework. We also propose memory reduction technique for back-propagation in deep learning. We analyzed the access pattern of back propagation in deep learning and realized that gradient computation and weight update, two transitionally sequentially done major steps, can be partially overlapped. In addition, we analyzed the semantics of the computation and realized that by delaying weight update we can avoid double buffering due to read/write conflicts in traditional naive parallel implementation. We then implement our techniques and observe up to 75% reduction in memory usage.
author2	Pangfeng Liu
author_facet	Pangfeng Liu Cing-Fu Jhu 朱清福
author	Cing-Fu Jhu 朱清福
spellingShingle	Cing-Fu Jhu 朱清福 Scheduling Optimization of Backpropagation for Deep Learning on GPU
author_sort	Cing-Fu Jhu
title	Scheduling Optimization of Backpropagation for Deep Learning on GPU
title_short	Scheduling Optimization of Backpropagation for Deep Learning on GPU
title_full	Scheduling Optimization of Backpropagation for Deep Learning on GPU
title_fullStr	Scheduling Optimization of Backpropagation for Deep Learning on GPU
title_full_unstemmed	Scheduling Optimization of Backpropagation for Deep Learning on GPU
title_sort	scheduling optimization of backpropagation for deep learning on gpu
publishDate	2018
url	http://ndltd.ncl.edu.tw/handle/p6fa2n
work_keys_str_mv	AT cingfujhu schedulingoptimizationofbackpropagationfordeeplearningongpu AT zhūqīngfú schedulingoptimizationofbackpropagationfordeeplearningongpu AT cingfujhu zhēnduìtúxíngchùlǐqìshàngdeshēndùxuéxífǎnxiàngchuánbōjìsuànzhīpáichéngyōuhuà AT zhūqīngfú zhēnduìtúxíngchùlǐqìshàngdeshēndùxuéxífǎnxiàngchuánbōjìsuànzhīpáichéngyōuhuà
_version_	1719229989551865856

Scheduling Optimization of Backpropagation for Deep Learning on GPU

Similar Items