Communication Usage Optimization of Gradient Sparsification with Aggregation

碩士 === 國立臺灣大學 === 資訊工程學研究所 === 106 === Communication usage is a bottleneck of scaling workers for distributed deep learning. One solution is to compress the exchanged gradients into sparse format with gradient sparsification. We found that the send cost of server, which is the aggregated size of spa...

Full description

Bibliographic Details
Main Authors: Sheng-Ping Wang, 王盛平
Other Authors: Pangfeng Liu
Format: Others
Language:en_US
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/ppyyqb
Description
Summary:碩士 === 國立臺灣大學 === 資訊工程學研究所 === 106 === Communication usage is a bottleneck of scaling workers for distributed deep learning. One solution is to compress the exchanged gradients into sparse format with gradient sparsification. We found that the send cost of server, which is the aggregated size of sparse gradient, can be reduced by the gradient selection from workers. Following an observation that only a few gradients are significantly large and in a short period of time, we proposed several gradient selection algorithms based on different metrics. Experiment showed that our proposed method can reduce the aggregated size for server, and the reduction in time per iteration can make the convergence rate faster than traditional sparsification.