Scheduling Optimization of Backpropagation for Deep Learning on GPU

碩士 === 國立臺灣大學 === 資訊工程學研究所 === 106 === Many large deep neural network models have been proposed in recent years to achieve more accurate training results. The training of these large models requires a huge amount of memory and communication, which becomes a challenging issue in improving the perform...

Full description

Bibliographic Details
Main Authors: Cing-Fu Jhu, 朱清福
Other Authors: Pangfeng Liu
Format: Others
Language:en_US
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/p6fa2n
id ndltd-TW-106NTU05392091
record_format oai_dc
spelling ndltd-TW-106NTU053920912019-07-25T04:46:48Z http://ndltd.ncl.edu.tw/handle/p6fa2n Scheduling Optimization of Backpropagation for Deep Learning on GPU 針對圖形處理器上的深度學習反向傳播計算之排程優化 Cing-Fu Jhu 朱清福 碩士 國立臺灣大學 資訊工程學研究所 106 Many large deep neural network models have been proposed in recent years to achieve more accurate training results. The training of these large models requires a huge amount of memory and communication, which becomes a challenging issue in improving the performance of deep learning. In this paper, we analyze the data access pattern of training a deep neural network and propose a data pinning algorithm that reduces the data usage on GPU and the movement between a GPU and its CPU host. We show that to find an optimal data movement scheduling is NP-complete,and propose a dynamic programming that can find the optimal solution in pseudo polynomial time. That is, we observe the access pattern of the training of the deep neural network and propose specialized GPU data pinning algorithm that minimizes the unnecessary data movements. We then implement our dynamic programming on to train real deep learning models. The experiments show that we can pin up to 20% more data into GPU memory than GeePS, a state of art deep learning framework. We also propose memory reduction technique for back-propagation in deep learning. We analyzed the access pattern of back propagation in deep learning and realized that gradient computation and weight update, two transitionally sequentially done major steps, can be partially overlapped. In addition, we analyzed the semantics of the computation and realized that by delaying weight update we can avoid double buffering due to read/write conflicts in traditional naive parallel implementation. We then implement our techniques and observe up to 75% reduction in memory usage. Pangfeng Liu 劉邦鋒 2018 學位論文 ; thesis 29 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立臺灣大學 === 資訊工程學研究所 === 106 === Many large deep neural network models have been proposed in recent years to achieve more accurate training results. The training of these large models requires a huge amount of memory and communication, which becomes a challenging issue in improving the performance of deep learning. In this paper, we analyze the data access pattern of training a deep neural network and propose a data pinning algorithm that reduces the data usage on GPU and the movement between a GPU and its CPU host. We show that to find an optimal data movement scheduling is NP-complete,and propose a dynamic programming that can find the optimal solution in pseudo polynomial time. That is, we observe the access pattern of the training of the deep neural network and propose specialized GPU data pinning algorithm that minimizes the unnecessary data movements. We then implement our dynamic programming on to train real deep learning models. The experiments show that we can pin up to 20% more data into GPU memory than GeePS, a state of art deep learning framework. We also propose memory reduction technique for back-propagation in deep learning. We analyzed the access pattern of back propagation in deep learning and realized that gradient computation and weight update, two transitionally sequentially done major steps, can be partially overlapped. In addition, we analyzed the semantics of the computation and realized that by delaying weight update we can avoid double buffering due to read/write conflicts in traditional naive parallel implementation. We then implement our techniques and observe up to 75% reduction in memory usage.
author2 Pangfeng Liu
author_facet Pangfeng Liu
Cing-Fu Jhu
朱清福
author Cing-Fu Jhu
朱清福
spellingShingle Cing-Fu Jhu
朱清福
Scheduling Optimization of Backpropagation for Deep Learning on GPU
author_sort Cing-Fu Jhu
title Scheduling Optimization of Backpropagation for Deep Learning on GPU
title_short Scheduling Optimization of Backpropagation for Deep Learning on GPU
title_full Scheduling Optimization of Backpropagation for Deep Learning on GPU
title_fullStr Scheduling Optimization of Backpropagation for Deep Learning on GPU
title_full_unstemmed Scheduling Optimization of Backpropagation for Deep Learning on GPU
title_sort scheduling optimization of backpropagation for deep learning on gpu
publishDate 2018
url http://ndltd.ncl.edu.tw/handle/p6fa2n
work_keys_str_mv AT cingfujhu schedulingoptimizationofbackpropagationfordeeplearningongpu
AT zhūqīngfú schedulingoptimizationofbackpropagationfordeeplearningongpu
AT cingfujhu zhēnduìtúxíngchùlǐqìshàngdeshēndùxuéxífǎnxiàngchuánbōjìsuànzhīpáichéngyōuhuà
AT zhūqīngfú zhēnduìtúxíngchùlǐqìshàngdeshēndùxuéxífǎnxiàngchuánbōjìsuànzhīpáichéngyōuhuà
_version_ 1719229989551865856