Bridging the Gap Between Memory and Communication Efficiency on Distributed Deep Learning Systems

Large-scale distributed deep learning is of great importance in various applications. For data-parallel distributed training systems, limited hardware resources (e.g., GPU memory and interconnection bandwidth) often become a performance bottleneck, and it is necessary to consider the full utilizatio...

Full description

Bibliographic Details
Main Authors: Shaofeng Zhao, Bo Liu, Fang Wang, Dan Feng
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9398682/