A novel multi-step Q-learning method to improve data efficiency for deep reinforcement learning

作者:

Highlights:

• A novel multi-step Q-learning method is proposed to improve data efficiency for DRL.

• The proposed multi-step Q-learning method is derived by adopting a new return function.

• The new return function alters the discount of future rewards and loosens the impact of the immediate reward.

• Experimental-results shows the proposed methods can improve the data efficiency of DRL agents.

摘要

•A novel multi-step Q-learning method is proposed to improve data efficiency for DRL.•The proposed multi-step Q-learning method is derived by adopting a new return function.•The new return function alters the discount of future rewards and loosens the impact of the immediate reward.•Experimental-results shows the proposed methods can improve the data efficiency of DRL agents.

论文关键词:Deep reinforcement learning,Robotics,Multi-step methods,Data efficiency

论文评审过程:Received 5 November 2018, Revised 7 March 2019, Accepted 18 March 2019, Available online 21 March 2019, Version of Record 26 April 2019.

论文官网地址:https://doi.org/10.1016/j.knosys.2019.03.018