Pipelined Training with Stale Weights in Deep Convolutional Neural Networks

The growth in size and complexity of convolutional neural networks (CNNs) is forcing the partitioning of a network across multiple accelerators during training and pipelining of backpropagation computations over these accelerators. Pipelining results in the use of stale weights. Existing approaches...

Full description

Bibliographic Details
Main Authors: Lifu Zhang, Tarek S. Abdelrahman
Format: Article
Language:English
Published: Hindawi Limited 2021-01-01
Series:Applied Computational Intelligence and Soft Computing
Online Access:http://dx.doi.org/10.1155/2021/3839543
id doaj-f61241a985b4482793a25f2f26140b1a
record_format Article
spelling doaj-f61241a985b4482793a25f2f26140b1a2021-10-04T01:58:05ZengHindawi LimitedApplied Computational Intelligence and Soft Computing1687-97322021-01-01202110.1155/2021/3839543Pipelined Training with Stale Weights in Deep Convolutional Neural NetworksLifu Zhang0Tarek S. Abdelrahman1Edward S. Rogers Sr. Department of Electrical and Computer EngineeringEdward S. Rogers Sr. Department of Electrical and Computer EngineeringThe growth in size and complexity of convolutional neural networks (CNNs) is forcing the partitioning of a network across multiple accelerators during training and pipelining of backpropagation computations over these accelerators. Pipelining results in the use of stale weights. Existing approaches to pipelined training avoid or limit the use of stale weights with techniques that either underutilize accelerators or increase training memory footprint. This paper contributes a pipelined backpropagation scheme that uses stale weights to maximize accelerator utilization and keep memory overhead modest. It explores the impact of stale weights on the statistical efficiency and performance using 4 CNNs (LeNet-5, AlexNet, VGG, and ResNet) and shows that when pipelining is introduced in early layers, training with stale weights converges and results in models with comparable inference accuracies to those resulting from nonpipelined training (a drop in accuracy of 0.4%, 4%, 0.83%, and 1.45% for the 4 networks, respectively). However, when pipelining is deeper in the network, inference accuracies drop significantly (up to 12% for VGG and 8.5% for ResNet-20). The paper also contributes a hybrid training scheme that combines pipelined with nonpipelined training to address this drop. The potential for performance improvement of the proposed scheme is demonstrated with a proof-of-concept pipelined backpropagation implementation in PyTorch on 2 GPUs using ResNet-56/110/224/362, achieving speedups of up to 1.8X over a 1-GPU baseline.http://dx.doi.org/10.1155/2021/3839543
collection DOAJ
language English
format Article
sources DOAJ
author Lifu Zhang
Tarek S. Abdelrahman
spellingShingle Lifu Zhang
Tarek S. Abdelrahman
Pipelined Training with Stale Weights in Deep Convolutional Neural Networks
Applied Computational Intelligence and Soft Computing
author_facet Lifu Zhang
Tarek S. Abdelrahman
author_sort Lifu Zhang
title Pipelined Training with Stale Weights in Deep Convolutional Neural Networks
title_short Pipelined Training with Stale Weights in Deep Convolutional Neural Networks
title_full Pipelined Training with Stale Weights in Deep Convolutional Neural Networks
title_fullStr Pipelined Training with Stale Weights in Deep Convolutional Neural Networks
title_full_unstemmed Pipelined Training with Stale Weights in Deep Convolutional Neural Networks
title_sort pipelined training with stale weights in deep convolutional neural networks
publisher Hindawi Limited
series Applied Computational Intelligence and Soft Computing
issn 1687-9732
publishDate 2021-01-01
description The growth in size and complexity of convolutional neural networks (CNNs) is forcing the partitioning of a network across multiple accelerators during training and pipelining of backpropagation computations over these accelerators. Pipelining results in the use of stale weights. Existing approaches to pipelined training avoid or limit the use of stale weights with techniques that either underutilize accelerators or increase training memory footprint. This paper contributes a pipelined backpropagation scheme that uses stale weights to maximize accelerator utilization and keep memory overhead modest. It explores the impact of stale weights on the statistical efficiency and performance using 4 CNNs (LeNet-5, AlexNet, VGG, and ResNet) and shows that when pipelining is introduced in early layers, training with stale weights converges and results in models with comparable inference accuracies to those resulting from nonpipelined training (a drop in accuracy of 0.4%, 4%, 0.83%, and 1.45% for the 4 networks, respectively). However, when pipelining is deeper in the network, inference accuracies drop significantly (up to 12% for VGG and 8.5% for ResNet-20). The paper also contributes a hybrid training scheme that combines pipelined with nonpipelined training to address this drop. The potential for performance improvement of the proposed scheme is demonstrated with a proof-of-concept pipelined backpropagation implementation in PyTorch on 2 GPUs using ResNet-56/110/224/362, achieving speedups of up to 1.8X over a 1-GPU baseline.
url http://dx.doi.org/10.1155/2021/3839543
work_keys_str_mv AT lifuzhang pipelinedtrainingwithstaleweightsindeepconvolutionalneuralnetworks
AT tareksabdelrahman pipelinedtrainingwithstaleweightsindeepconvolutionalneuralnetworks
_version_ 1716844783944597504