Pipelined Training with Stale Weights in Deep Convolutional Neural Networks
The growth in size and complexity of convolutional neural networks (CNNs) is forcing the partitioning of a network across multiple accelerators during training and pipelining of backpropagation computations over these accelerators. Pipelining results in the use of stale weights. Existing approaches...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi Limited
2021-01-01
|
Series: | Applied Computational Intelligence and Soft Computing |
Online Access: | http://dx.doi.org/10.1155/2021/3839543 |
id |
doaj-f61241a985b4482793a25f2f26140b1a |
---|---|
record_format |
Article |
spelling |
doaj-f61241a985b4482793a25f2f26140b1a2021-10-04T01:58:05ZengHindawi LimitedApplied Computational Intelligence and Soft Computing1687-97322021-01-01202110.1155/2021/3839543Pipelined Training with Stale Weights in Deep Convolutional Neural NetworksLifu Zhang0Tarek S. Abdelrahman1Edward S. Rogers Sr. Department of Electrical and Computer EngineeringEdward S. Rogers Sr. Department of Electrical and Computer EngineeringThe growth in size and complexity of convolutional neural networks (CNNs) is forcing the partitioning of a network across multiple accelerators during training and pipelining of backpropagation computations over these accelerators. Pipelining results in the use of stale weights. Existing approaches to pipelined training avoid or limit the use of stale weights with techniques that either underutilize accelerators or increase training memory footprint. This paper contributes a pipelined backpropagation scheme that uses stale weights to maximize accelerator utilization and keep memory overhead modest. It explores the impact of stale weights on the statistical efficiency and performance using 4 CNNs (LeNet-5, AlexNet, VGG, and ResNet) and shows that when pipelining is introduced in early layers, training with stale weights converges and results in models with comparable inference accuracies to those resulting from nonpipelined training (a drop in accuracy of 0.4%, 4%, 0.83%, and 1.45% for the 4 networks, respectively). However, when pipelining is deeper in the network, inference accuracies drop significantly (up to 12% for VGG and 8.5% for ResNet-20). The paper also contributes a hybrid training scheme that combines pipelined with nonpipelined training to address this drop. The potential for performance improvement of the proposed scheme is demonstrated with a proof-of-concept pipelined backpropagation implementation in PyTorch on 2 GPUs using ResNet-56/110/224/362, achieving speedups of up to 1.8X over a 1-GPU baseline.http://dx.doi.org/10.1155/2021/3839543 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Lifu Zhang Tarek S. Abdelrahman |
spellingShingle |
Lifu Zhang Tarek S. Abdelrahman Pipelined Training with Stale Weights in Deep Convolutional Neural Networks Applied Computational Intelligence and Soft Computing |
author_facet |
Lifu Zhang Tarek S. Abdelrahman |
author_sort |
Lifu Zhang |
title |
Pipelined Training with Stale Weights in Deep Convolutional Neural Networks |
title_short |
Pipelined Training with Stale Weights in Deep Convolutional Neural Networks |
title_full |
Pipelined Training with Stale Weights in Deep Convolutional Neural Networks |
title_fullStr |
Pipelined Training with Stale Weights in Deep Convolutional Neural Networks |
title_full_unstemmed |
Pipelined Training with Stale Weights in Deep Convolutional Neural Networks |
title_sort |
pipelined training with stale weights in deep convolutional neural networks |
publisher |
Hindawi Limited |
series |
Applied Computational Intelligence and Soft Computing |
issn |
1687-9732 |
publishDate |
2021-01-01 |
description |
The growth in size and complexity of convolutional neural networks (CNNs) is forcing the partitioning of a network across multiple accelerators during training and pipelining of backpropagation computations over these accelerators. Pipelining results in the use of stale weights. Existing approaches to pipelined training avoid or limit the use of stale weights with techniques that either underutilize accelerators or increase training memory footprint. This paper contributes a pipelined backpropagation scheme that uses stale weights to maximize accelerator utilization and keep memory overhead modest. It explores the impact of stale weights on the statistical efficiency and performance using 4 CNNs (LeNet-5, AlexNet, VGG, and ResNet) and shows that when pipelining is introduced in early layers, training with stale weights converges and results in models with comparable inference accuracies to those resulting from nonpipelined training (a drop in accuracy of 0.4%, 4%, 0.83%, and 1.45% for the 4 networks, respectively). However, when pipelining is deeper in the network, inference accuracies drop significantly (up to 12% for VGG and 8.5% for ResNet-20). The paper also contributes a hybrid training scheme that combines pipelined with nonpipelined training to address this drop. The potential for performance improvement of the proposed scheme is demonstrated with a proof-of-concept pipelined backpropagation implementation in PyTorch on 2 GPUs using ResNet-56/110/224/362, achieving speedups of up to 1.8X over a 1-GPU baseline. |
url |
http://dx.doi.org/10.1155/2021/3839543 |
work_keys_str_mv |
AT lifuzhang pipelinedtrainingwithstaleweightsindeepconvolutionalneuralnetworks AT tareksabdelrahman pipelinedtrainingwithstaleweightsindeepconvolutionalneuralnetworks |
_version_ |
1716844783944597504 |