Stochastic cubic-regularized policy gradient method

作者:

Highlights:

• We propose a new algorithm SCR-PG which achieves the good properties of previous works.

• We provide a non-asymptotic analysis of SCR-PG’s complexity with high probability.

• Experimental results are presented to validate the superior performance of SCR-PG.

摘要

•We propose a new algorithm SCR-PG which achieves the good properties of previous works.•We provide a non-asymptotic analysis of SCR-PG’s complexity with high probability.•Experimental results are presented to validate the superior performance of SCR-PG.

论文关键词:Reinforcement learning,Policy gradient,Stochastic optimization,Non-convex optimization,Second-order stationary point

论文评审过程:Received 18 April 2022, Revised 28 July 2022, Accepted 11 August 2022, Available online 19 August 2022, Version of Record 5 September 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.109687