Distributed optimization for degenerate loss functions arising from over-parameterization

作者:

摘要

We consider distributed optimization with degenerate loss functions, where the optimal sets of local loss functions have a non-empty intersection. This regime often arises in optimizing large-scale multi-agent AI systems (e.g., deep learning systems), where the number of trainable weights far exceeds the number of training samples, leading to highly degenerate loss surfaces. Under appropriate conditions, we prove that distributed gradient descent in this case converges even when communication is arbitrarily less frequent, which is not the case for non-degenerate loss functions. Moreover, we quantitatively analyze the convergence rate, as well as the communication and computation trade-off, providing insights into designing efficient distributed optimization algorithms. Our theoretical findings are confirmed by both distributed convex optimization and deep learning experiments.

论文关键词:Distributed optimization,Over-parameterization,Deep learning

论文评审过程:Received 4 July 2020, Revised 1 July 2021, Accepted 2 August 2021, Available online 16 August 2021, Version of Record 25 August 2021.

论文官网地址:https://doi.org/10.1016/j.artint.2021.103575