Summary: | The emerging concern about data privacy and security has motivated the proposal of federated learning. Federated learning allows computing nodes to only synchronize the locally-trained models instead of their original data in distributed training. Conventional federated learning architecture, inherited from the parameter server design, relies on highly centralized typologies and large nodes-to-server bandwidths. However, in real-world federated learning scenarios, the network capacities between nodes are highly uniformly distributed and smaller than that in data centers. As a result, how to efficiently utilize network capacities between computing nodes is crucial for conventional federated learning. In this paper, we propose Bandwidth Aware Combo (BACombo), a model segment level decentralized federated learning, to tackle this problem. In BACombo, we propose a segmented gossip aggregation mechanism that makes full use of node-to-node bandwidth for speeding up the communication time. Besides, a bandwidth-aware worker selection model further reduces the transmission delay by greedily choosing the bandwidth-sufficient worker. The convergence guarantees are provided for BACombo. The experimental results on various datasets demonstrate that the training time is reduced by up to 18 times that of baselines without accuracy degrade.
|