Improving exploration efficiency of deep reinforcement learning through samples produced by generative model

作者:

Highlights:

• Propose a GAN-like framework to improve the convergence speed.

• Our GRASP model is an alternative and complementary approach.

• Learn from demonstrations to predict the subsequent action in reinforcement learning.

• Reshape the exploration space by selecting from the set of predicted actions.

• Boost the exploration and speed up the convergence.

摘要

•Propose a GAN-like framework to improve the convergence speed.•Our GRASP model is an alternative and complementary approach.•Learn from demonstrations to predict the subsequent action in reinforcement learning.•Reshape the exploration space by selecting from the set of predicted actions.•Boost the exploration and speed up the convergence.

论文关键词:Deep reinforcement learning,Exploration,Generative adversarial network,Sample efficiency,Convergence

论文评审过程:Received 30 July 2020, Revised 21 July 2021, Accepted 25 July 2021, Available online 30 July 2021, Version of Record 4 August 2021.

论文官网地址:https://doi.org/10.1016/j.eswa.2021.115680