Variance reduced optimization with implicit gradient transport

作者:

Highlights:

摘要

The question of how to readily reduce variance in stochastic optimization methods is challenging. The most widely used approach of reducing variance in stochastic optimization is to employ past gradients. Although utilizing second order information in stochastic optimization has recently been proved to be an effective way to track gradients, research on taking second order information to track gradients is quite limited due to a high computational burden. In this paper, we introduce the modified form of the implicit gradient transport (MIGT) technique into the classical stochastic variance reduced methods, i.e., SVRG, which leads to a new method called SVRG–MIGT. The SVRG–MIGT method enjoys second order information to provide faster variance reduction, but without computing the Hessian explicitly. We analyze the theoretical convergence properties of the SVRG–MIGT method for strongly convex objective functions and prove that it is comparable to state-of-the-art methods. To further confirm the effectiveness of MIGT, we propose Acc-Prox-SVRG–MIGT, a novel variant of the accelerated mini-batch Prox-SVRG (Acc-Prox-SVRG) method, by introducing it into Acc-Prox-SVRG. In addition, to greatly reduce the computational complexity, we propose L-SVRG–MIGT, by introducing MIGT to loopless SVRG (L-SVRG), which uses the full gradient with a small probability. Moreover, we establish a convergence analysis for L-SVRG–MIGT. Experimental results on a wide range of data sets demonstrate the efficacy and robustness of our proposed methods.

论文关键词:Stochastic optimization,Variance reduction,Second-order optimization

论文评审过程:Received 23 May 2020, Revised 9 September 2020, Accepted 24 November 2020, Available online 25 November 2020, Version of Record 26 November 2020.

论文官网地址:https://doi.org/10.1016/j.knosys.2020.106626