Scheduling Optimization of Backpropagation for Deep Learning on GPU
碩士 === 國立臺灣大學 === 資訊工程學研究所 === 106 === Many large deep neural network models have been proposed in recent years to achieve more accurate training results. The training of these large models requires a huge amount of memory and communication, which becomes a challenging issue in improving the perform...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2018
|
Online Access: | http://ndltd.ncl.edu.tw/handle/p6fa2n |
id |
ndltd-TW-106NTU05392091 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-106NTU053920912019-07-25T04:46:48Z http://ndltd.ncl.edu.tw/handle/p6fa2n Scheduling Optimization of Backpropagation for Deep Learning on GPU 針對圖形處理器上的深度學習反向傳播計算之排程優化 Cing-Fu Jhu 朱清福 碩士 國立臺灣大學 資訊工程學研究所 106 Many large deep neural network models have been proposed in recent years to achieve more accurate training results. The training of these large models requires a huge amount of memory and communication, which becomes a challenging issue in improving the performance of deep learning. In this paper, we analyze the data access pattern of training a deep neural network and propose a data pinning algorithm that reduces the data usage on GPU and the movement between a GPU and its CPU host. We show that to find an optimal data movement scheduling is NP-complete,and propose a dynamic programming that can find the optimal solution in pseudo polynomial time. That is, we observe the access pattern of the training of the deep neural network and propose specialized GPU data pinning algorithm that minimizes the unnecessary data movements. We then implement our dynamic programming on to train real deep learning models. The experiments show that we can pin up to 20% more data into GPU memory than GeePS, a state of art deep learning framework. We also propose memory reduction technique for back-propagation in deep learning. We analyzed the access pattern of back propagation in deep learning and realized that gradient computation and weight update, two transitionally sequentially done major steps, can be partially overlapped. In addition, we analyzed the semantics of the computation and realized that by delaying weight update we can avoid double buffering due to read/write conflicts in traditional naive parallel implementation. We then implement our techniques and observe up to 75% reduction in memory usage. Pangfeng Liu 劉邦鋒 2018 學位論文 ; thesis 29 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣大學 === 資訊工程學研究所 === 106 === Many large deep neural network models have been proposed in recent years to achieve more accurate training results. The training of these large models requires a huge amount of memory and communication, which becomes a challenging issue in improving the performance of deep learning. In this paper, we analyze the data access pattern of training a deep neural network and propose a data pinning algorithm that reduces the data usage on GPU and the movement between a GPU and its CPU host. We show that to find an optimal data movement scheduling is NP-complete,and propose a dynamic programming that can find the optimal solution in pseudo polynomial time. That is, we observe the access pattern of the training of the deep neural network and propose specialized GPU data pinning algorithm that minimizes the unnecessary data movements. We then implement our dynamic programming on to train real deep learning models. The experiments show that we can pin up to 20% more data into GPU memory than GeePS, a state of art deep learning framework. We also propose memory reduction technique for back-propagation in deep learning. We analyzed the access pattern of back propagation in deep learning and realized that gradient computation and weight update, two transitionally sequentially done major steps, can be partially overlapped. In addition, we analyzed the semantics of the computation and realized that by delaying weight update we can avoid double buffering due to read/write conflicts in traditional naive parallel implementation. We then implement our techniques and observe up to 75% reduction in memory usage.
|
author2 |
Pangfeng Liu |
author_facet |
Pangfeng Liu Cing-Fu Jhu 朱清福 |
author |
Cing-Fu Jhu 朱清福 |
spellingShingle |
Cing-Fu Jhu 朱清福 Scheduling Optimization of Backpropagation for Deep Learning on GPU |
author_sort |
Cing-Fu Jhu |
title |
Scheduling Optimization of Backpropagation for Deep Learning on GPU |
title_short |
Scheduling Optimization of Backpropagation for Deep Learning on GPU |
title_full |
Scheduling Optimization of Backpropagation for Deep Learning on GPU |
title_fullStr |
Scheduling Optimization of Backpropagation for Deep Learning on GPU |
title_full_unstemmed |
Scheduling Optimization of Backpropagation for Deep Learning on GPU |
title_sort |
scheduling optimization of backpropagation for deep learning on gpu |
publishDate |
2018 |
url |
http://ndltd.ncl.edu.tw/handle/p6fa2n |
work_keys_str_mv |
AT cingfujhu schedulingoptimizationofbackpropagationfordeeplearningongpu AT zhūqīngfú schedulingoptimizationofbackpropagationfordeeplearningongpu AT cingfujhu zhēnduìtúxíngchùlǐqìshàngdeshēndùxuéxífǎnxiàngchuánbōjìsuànzhīpáichéngyōuhuà AT zhūqīngfú zhēnduìtúxíngchùlǐqìshàngdeshēndùxuéxífǎnxiàngchuánbōjìsuànzhīpáichéngyōuhuà |
_version_ |
1719229989551865856 |