Regret bounded by gradual variation for online convex optimization

作者:Tianbao Yang, Mehrdad Mahdavi, Rong Jin, Shenghuo Zhu

摘要

Recently, it has been shown that the regret of the Follow the Regularized Leader (FTRL) algorithm for online linear optimization can be bounded by the total variation of the cost vectors rather than the number of rounds. In this paper, we extend this result to general online convex optimization. In particular, this resolves an open problem that has been posed in a number of recent papers. We first analyze the limitations of the FTRL algorithm as proposed by Hazan and Kale (in Machine Learning 80(2–3), 165–188, 2010) when applied to online convex optimization, and extend the definition of variation to a gradual variation which is shown to be a lower bound of the total variation. We then present two novel algorithms that bound the regret by the gradual variation of cost functions. Unlike previous approaches that maintain a single sequence of solutions, the proposed algorithms maintain two sequences of solutions that make it possible to achieve a variation-based regret bound for online convex optimization.

论文关键词:Online convex optimization, Regret bound, Gradual variation, Bandit

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10994-013-5418-8