Accelerating local SGD for non-IID data using variance reduction

作者：Xianfeng Liang, Shuheng Shen, Enhong Chen, Jinchang Liu, Qi Liu, Yifei Cheng, Zhen Pan

摘要

Distributed stochastic gradient descent and its variants have been widely adopted in the training of machine learning models, which apply multiple workers in parallel. Among them, local-based algorithms, including Local SGD and FedAvg, have gained much attention due to their superior properties, such as low communication cost and privacy-preserving. Nevertheless, when the data distribution on workers is non-identical, local-based algorithms would encounter a significant degradation in the convergence rate. In this paper, we propose Variance Reduced Local SGD (VRL-SGD) to deal with the heterogeneous data. Without extra communication cost, VRL-SGD can reduce the gradient variance among workers caused by the heterogeneous data, and thus it prevents local-based algorithms from slow convergence rate. Moreover, we present VRL-SGD-W with an effective warm-up mechanism for the scenarios, where the data among workers are quite diverse. Benefiting from eliminating the impact of such heterogeneous data, we theoretically prove that VRL-SGD achieves a linear iteration speedup with lower communication complexity even if workers access non-identical datasets. We conduct experiments on three machine learning tasks. The experimental results demonstrate that VRL-SGD performs impressively better than Local SGD for the heterogeneous data and VRL-SGD-W is much robust under high data variance among workers.

论文关键词：distributed optimization, variance reduction, local SGD, federated learning, non-IID data

论文评审过程：

论文官网地址：https://doi.org/10.1007/s11704-021-1018-0