SoftMemoryBox II: A Scalable, Shared Memory Buffer Framework for Accelerating Distributed Training of Large-Scale Deep Neural Networks

Distributed processing using high-performance computing resources is essential for developers to train large-scale deep neural networks (DNNs). The major impediment to distributed DNN training is the communication bottleneck during the parameter exchange among the distributed DNN training workers. T...

Full description

Bibliographic Details
Main Authors:	Shinyoung Ahn, Eunji Lim
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	High-performance computing distributed computing SoftMemoryBox II shared memory buffer deep neural networks distributed deep learning
Online Access:	https://ieeexplore.ieee.org/document/9260142/

id	doaj-0e3653820dcb4b74ad6354f639218ba4
record_format	Article
spelling	doaj-0e3653820dcb4b74ad6354f639218ba42021-03-30T03:56:41ZengIEEEIEEE Access2169-35362020-01-01820709720711110.1109/ACCESS.2020.30381129260142SoftMemoryBox II: A Scalable, Shared Memory Buffer Framework for Accelerating Distributed Training of Large-Scale Deep Neural NetworksShinyoung Ahn0https://orcid.org/0000-0002-2686-7273Eunji Lim1Supercomputing Technology Research Center, Electronics and Telecommunications Research Institute (ETRI), Daejeon, South KoreaSupercomputing Technology Research Center, Electronics and Telecommunications Research Institute (ETRI), Daejeon, South KoreaDistributed processing using high-performance computing resources is essential for developers to train large-scale deep neural networks (DNNs). The major impediment to distributed DNN training is the communication bottleneck during the parameter exchange among the distributed DNN training workers. The communication bottleneck increases training time and decreases the utilization of the computational resources. Our previous study, SoftMemoryBox (SMB1) presented considerably superior performance compared to message passing interface (MPI) in the parameter communication of distributed DNN training. However, SMB1 had disadvantages such as the limited scalability of the distributed DNN training due to the restricted communication bandwidth from a single memory server, inability to provide a synchronization function for the shared memory buffer, and low portability/usability as a consequence of the kernel-level implementation. This paper proposes a scalable, shared memory buffer framework, called SoftMemoryBox II (SMB2), which overcomes the shortcomings of SMB1. With SMB2, distributed training processes can easily share virtually unified shared memory buffers composed of memory segments provided from remote memory servers and can exchange DNN parameters at high speed through the shared memory buffer. The scalable communication bandwidth of the SMB2 framework facilitates the reduction of DNN distributed training times compared to SMB1. According to intensive evaluation results, the communication bandwidth of the proposed SMB2 is 6.3 times greater than that of SMB1 when the SMB2 framework is scaled out to use eight memory servers. Moreover, the training time of SMB2-based asynchronous distributed training of five DNN models is up to 2.4 times faster than SMB1-based training.https://ieeexplore.ieee.org/document/9260142/High-performance computingdistributed computingSoftMemoryBox IIshared memory bufferdeep neural networksdistributed deep learning
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Shinyoung Ahn Eunji Lim
spellingShingle	Shinyoung Ahn Eunji Lim SoftMemoryBox II: A Scalable, Shared Memory Buffer Framework for Accelerating Distributed Training of Large-Scale Deep Neural Networks IEEE Access High-performance computing distributed computing SoftMemoryBox II shared memory buffer deep neural networks distributed deep learning
author_facet	Shinyoung Ahn Eunji Lim
author_sort	Shinyoung Ahn
title	SoftMemoryBox II: A Scalable, Shared Memory Buffer Framework for Accelerating Distributed Training of Large-Scale Deep Neural Networks
title_short	SoftMemoryBox II: A Scalable, Shared Memory Buffer Framework for Accelerating Distributed Training of Large-Scale Deep Neural Networks
title_full	SoftMemoryBox II: A Scalable, Shared Memory Buffer Framework for Accelerating Distributed Training of Large-Scale Deep Neural Networks
title_fullStr	SoftMemoryBox II: A Scalable, Shared Memory Buffer Framework for Accelerating Distributed Training of Large-Scale Deep Neural Networks
title_full_unstemmed	SoftMemoryBox II: A Scalable, Shared Memory Buffer Framework for Accelerating Distributed Training of Large-Scale Deep Neural Networks
title_sort	softmemorybox ii: a scalable, shared memory buffer framework for accelerating distributed training of large-scale deep neural networks
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2020-01-01
description	Distributed processing using high-performance computing resources is essential for developers to train large-scale deep neural networks (DNNs). The major impediment to distributed DNN training is the communication bottleneck during the parameter exchange among the distributed DNN training workers. The communication bottleneck increases training time and decreases the utilization of the computational resources. Our previous study, SoftMemoryBox (SMB1) presented considerably superior performance compared to message passing interface (MPI) in the parameter communication of distributed DNN training. However, SMB1 had disadvantages such as the limited scalability of the distributed DNN training due to the restricted communication bandwidth from a single memory server, inability to provide a synchronization function for the shared memory buffer, and low portability/usability as a consequence of the kernel-level implementation. This paper proposes a scalable, shared memory buffer framework, called SoftMemoryBox II (SMB2), which overcomes the shortcomings of SMB1. With SMB2, distributed training processes can easily share virtually unified shared memory buffers composed of memory segments provided from remote memory servers and can exchange DNN parameters at high speed through the shared memory buffer. The scalable communication bandwidth of the SMB2 framework facilitates the reduction of DNN distributed training times compared to SMB1. According to intensive evaluation results, the communication bandwidth of the proposed SMB2 is 6.3 times greater than that of SMB1 when the SMB2 framework is scaled out to use eight memory servers. Moreover, the training time of SMB2-based asynchronous distributed training of five DNN models is up to 2.4 times faster than SMB1-based training.
topic	High-performance computing distributed computing SoftMemoryBox II shared memory buffer deep neural networks distributed deep learning
url	https://ieeexplore.ieee.org/document/9260142/
work_keys_str_mv	AT shinyoungahn softmemoryboxiiascalablesharedmemorybufferframeworkforacceleratingdistributedtrainingoflargescaledeepneuralnetworks AT eunjilim softmemoryboxiiascalablesharedmemorybufferframeworkforacceleratingdistributedtrainingoflargescaledeepneuralnetworks
_version_	1724182588201893888

SoftMemoryBox II: A Scalable, Shared Memory Buffer Framework for Accelerating Distributed Training of Large-Scale Deep Neural Networks

Similar Items