Stochastic cubic-regularized policy gradient method
作者:
Highlights:
• We propose a new algorithm SCR-PG which achieves the good properties of previous works.
• We provide a non-asymptotic analysis of SCR-PG’s complexity with high probability.
• Experimental results are presented to validate the superior performance of SCR-PG.
摘要
•We propose a new algorithm SCR-PG which achieves the good properties of previous works.•We provide a non-asymptotic analysis of SCR-PG’s complexity with high probability.•Experimental results are presented to validate the superior performance of SCR-PG.
论文关键词:Reinforcement learning,Policy gradient,Stochastic optimization,Non-convex optimization,Second-order stationary point
论文评审过程:Received 18 April 2022, Revised 28 July 2022, Accepted 11 August 2022, Available online 19 August 2022, Version of Record 5 September 2022.
论文官网地址:https://doi.org/10.1016/j.knosys.2022.109687