DisSAGD: A Distributed Parameter Update Scheme Based on Variance Reduction

Machine learning models often converge slowly and are unstable due to the significant variance of random data when using a sample estimate gradient in SGD. To increase the speed of convergence and improve stability, a distributed SGD algorithm based on variance reduction, named DisSAGD, is proposed...

Full description

Bibliographic Details
Main Authors:	Haijie Pan, Lirong Zheng
Format:	Article
Language:	English
Published:	MDPI AG 2021-07-01
Series:	Sensors
Subjects:	gradient descent machine learning distributed cluster adaptive sampling variance reduction
Online Access:	https://www.mdpi.com/1424-8220/21/15/5124

id	doaj-46a040777e484325b457bbafade63cc1
record_format	Article
spelling	doaj-46a040777e484325b457bbafade63cc12021-08-06T15:31:31ZengMDPI AGSensors1424-82202021-07-01215124512410.3390/s21155124DisSAGD: A Distributed Parameter Update Scheme Based on Variance ReductionHaijie Pan0Lirong Zheng1School of Information Science and Engineering, Fudan University, Yangpu District, Shanghai 200433, ChinaSchool of Information Science and Engineering, Fudan University, Yangpu District, Shanghai 200433, ChinaMachine learning models often converge slowly and are unstable due to the significant variance of random data when using a sample estimate gradient in SGD. To increase the speed of convergence and improve stability, a distributed SGD algorithm based on variance reduction, named DisSAGD, is proposed in this study. DisSAGD corrects the gradient estimate for each iteration by using the gradient variance of historical iterations without full gradient computation or additional storage, i.e., it reduces the mean variance of historical gradients in order to reduce the error in updating parameters. We implemented DisSAGD in distributed clusters in order to train a machine learning model by sharing parameters among nodes using an asynchronous communication protocol. We also propose an adaptive learning rate strategy, as well as a sampling strategy, to address the update lag of the overall parameter distribution, which helps to improve the convergence speed when the parameters deviate from the optimal value—when one working node is faster than another, this node will have more time to compute the local gradient and sample more samples for the next iteration. Our experiments demonstrate that DisSAGD significantly reduces waiting times during loop iterations and improves convergence speed when compared to traditional methods, and that our method can achieve speed increases for distributed clusters.https://www.mdpi.com/1424-8220/21/15/5124gradient descentmachine learningdistributed clusteradaptive samplingvariance reduction
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Haijie Pan Lirong Zheng
spellingShingle	Haijie Pan Lirong Zheng DisSAGD: A Distributed Parameter Update Scheme Based on Variance Reduction Sensors gradient descent machine learning distributed cluster adaptive sampling variance reduction
author_facet	Haijie Pan Lirong Zheng
author_sort	Haijie Pan
title	DisSAGD: A Distributed Parameter Update Scheme Based on Variance Reduction
title_short	DisSAGD: A Distributed Parameter Update Scheme Based on Variance Reduction
title_full	DisSAGD: A Distributed Parameter Update Scheme Based on Variance Reduction
title_fullStr	DisSAGD: A Distributed Parameter Update Scheme Based on Variance Reduction
title_full_unstemmed	DisSAGD: A Distributed Parameter Update Scheme Based on Variance Reduction
title_sort	dissagd: a distributed parameter update scheme based on variance reduction
publisher	MDPI AG
series	Sensors
issn	1424-8220
publishDate	2021-07-01
description	Machine learning models often converge slowly and are unstable due to the significant variance of random data when using a sample estimate gradient in SGD. To increase the speed of convergence and improve stability, a distributed SGD algorithm based on variance reduction, named DisSAGD, is proposed in this study. DisSAGD corrects the gradient estimate for each iteration by using the gradient variance of historical iterations without full gradient computation or additional storage, i.e., it reduces the mean variance of historical gradients in order to reduce the error in updating parameters. We implemented DisSAGD in distributed clusters in order to train a machine learning model by sharing parameters among nodes using an asynchronous communication protocol. We also propose an adaptive learning rate strategy, as well as a sampling strategy, to address the update lag of the overall parameter distribution, which helps to improve the convergence speed when the parameters deviate from the optimal value—when one working node is faster than another, this node will have more time to compute the local gradient and sample more samples for the next iteration. Our experiments demonstrate that DisSAGD significantly reduces waiting times during loop iterations and improves convergence speed when compared to traditional methods, and that our method can achieve speed increases for distributed clusters.
topic	gradient descent machine learning distributed cluster adaptive sampling variance reduction
url	https://www.mdpi.com/1424-8220/21/15/5124
work_keys_str_mv	AT haijiepan dissagdadistributedparameterupdateschemebasedonvariancereduction AT lirongzheng dissagdadistributedparameterupdateschemebasedonvariancereduction
_version_	1721217579473698816

DisSAGD: A Distributed Parameter Update Scheme Based on Variance Reduction

Similar Items