Communication Optimization Schemes for Accelerating Distributed Deep Learning Systems

In a distributed deep learning system, a parameter server and workers must communicate to exchange gradients and parameters, and the communication cost increases as the number of workers increases. This paper presents a communication data optimization scheme to mitigate the decrease in throughput du...

Full description

Bibliographic Details
Main Authors:	Jaehwan Lee, Hyeonseong Choi, Hyeonwoo Jeong, Baekhyeon Noh, Ji Sun Shin
Format:	Article
Language:	English
Published:	MDPI AG 2020-12-01
Series:	Applied Sciences
Subjects:	distributed deep learning multi-GPU data parallelism communication optimization
Online Access:	https://www.mdpi.com/2076-3417/10/24/8846

id	doaj-c66d83e5d10d49f29243f87698366647
record_format	Article
spelling	doaj-c66d83e5d10d49f29243f876983666472020-12-11T00:03:58ZengMDPI AGApplied Sciences2076-34172020-12-01108846884610.3390/app10248846Communication Optimization Schemes for Accelerating Distributed Deep Learning SystemsJaehwan Lee0Hyeonseong Choi1Hyeonwoo Jeong2Baekhyeon Noh3Ji Sun Shin4School of Electronics and Information Engineering, Korea Aerospace University, Goyang-si 10540, KoreaSchool of Electronics and Information Engineering, Korea Aerospace University, Goyang-si 10540, KoreaSchool of Electronics and Information Engineering, Korea Aerospace University, Goyang-si 10540, KoreaSchool of Electronics and Information Engineering, Korea Aerospace University, Goyang-si 10540, KoreaDepartment of Computer and Information Security , Sejong University, Seoul 05006, KoreaIn a distributed deep learning system, a parameter server and workers must communicate to exchange gradients and parameters, and the communication cost increases as the number of workers increases. This paper presents a communication data optimization scheme to mitigate the decrease in throughput due to communication performance bottlenecks in distributed deep learning. To optimize communication, we propose two methods. The first is a layer dropping scheme to reduce communication data. The layer dropping scheme we propose compares the representative values of each hidden layer with a threshold value. Furthermore, to guarantee the training accuracy, we store the gradients that are not transmitted to the parameter server in the worker’s local cache. When the value of gradients stored in the worker’s local cache is greater than the threshold, the gradients stored in the worker’s local cache are transmitted to the parameter server. The second is an efficient threshold selection method. Our threshold selection method computes the threshold by replacing the gradients with the L1 norm of each hidden layer. Our data optimization scheme reduces the communication time by about 81% and the total training time by about 70% in a 56 Gbit network environment.https://www.mdpi.com/2076-3417/10/24/8846distributed deep learningmulti-GPUdata parallelismcommunication optimization
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Jaehwan Lee Hyeonseong Choi Hyeonwoo Jeong Baekhyeon Noh Ji Sun Shin
spellingShingle	Jaehwan Lee Hyeonseong Choi Hyeonwoo Jeong Baekhyeon Noh Ji Sun Shin Communication Optimization Schemes for Accelerating Distributed Deep Learning Systems Applied Sciences distributed deep learning multi-GPU data parallelism communication optimization
author_facet	Jaehwan Lee Hyeonseong Choi Hyeonwoo Jeong Baekhyeon Noh Ji Sun Shin
author_sort	Jaehwan Lee
title	Communication Optimization Schemes for Accelerating Distributed Deep Learning Systems
title_short	Communication Optimization Schemes for Accelerating Distributed Deep Learning Systems
title_full	Communication Optimization Schemes for Accelerating Distributed Deep Learning Systems
title_fullStr	Communication Optimization Schemes for Accelerating Distributed Deep Learning Systems
title_full_unstemmed	Communication Optimization Schemes for Accelerating Distributed Deep Learning Systems
title_sort	communication optimization schemes for accelerating distributed deep learning systems
publisher	MDPI AG
series	Applied Sciences
issn	2076-3417
publishDate	2020-12-01
description	In a distributed deep learning system, a parameter server and workers must communicate to exchange gradients and parameters, and the communication cost increases as the number of workers increases. This paper presents a communication data optimization scheme to mitigate the decrease in throughput due to communication performance bottlenecks in distributed deep learning. To optimize communication, we propose two methods. The first is a layer dropping scheme to reduce communication data. The layer dropping scheme we propose compares the representative values of each hidden layer with a threshold value. Furthermore, to guarantee the training accuracy, we store the gradients that are not transmitted to the parameter server in the worker’s local cache. When the value of gradients stored in the worker’s local cache is greater than the threshold, the gradients stored in the worker’s local cache are transmitted to the parameter server. The second is an efficient threshold selection method. Our threshold selection method computes the threshold by replacing the gradients with the L1 norm of each hidden layer. Our data optimization scheme reduces the communication time by about 81% and the total training time by about 70% in a 56 Gbit network environment.
topic	distributed deep learning multi-GPU data parallelism communication optimization
url	https://www.mdpi.com/2076-3417/10/24/8846
work_keys_str_mv	AT jaehwanlee communicationoptimizationschemesforacceleratingdistributeddeeplearningsystems AT hyeonseongchoi communicationoptimizationschemesforacceleratingdistributeddeeplearningsystems AT hyeonwoojeong communicationoptimizationschemesforacceleratingdistributeddeeplearningsystems AT baekhyeonnoh communicationoptimizationschemesforacceleratingdistributeddeeplearningsystems AT jisunshin communicationoptimizationschemesforacceleratingdistributeddeeplearningsystems
_version_	1724387064961564672

Communication Optimization Schemes for Accelerating Distributed Deep Learning Systems

Similar Items