Anchor: The achieved goal to replace the subgoal for hierarchical reinforcement learning

作者:

Highlights:

摘要

Hierarchical reinforcement learning (HRL) extends traditional reinforcement learning methods to complex tasks, such as the continuous control task with long horizon. As an effective paradigm for HRL, the subgoal-based HRL method uses subgoals to provide intrinsic motivation which helps the agent to reach the desired goal. However, it is tough to determine the subgoal. In this paper, we present a new concept called anchor to replace the subgoal. Our anchor is selected from the achieved goals of the agent. By the anchor, we propose a new HRL method which encourages the agent to move fast away from the corresponding anchor in the right direction of reaching the desired goal. Specifically, for moving fast, our new method uses an intrinsic reward computed by the distance between the current achieved goal and the corresponding anchor. Meanwhile, for moving in the right direction, it weights the intrinsic reward by the extrinsic rewards collected in the process of moving away from the corresponding anchor. The experiments demonstrate the effectiveness of the proposed method on the continuous control task with long horizon.

论文关键词:Hierarchical reinforcement learning,Reinforcement learning,Continuous control,Intrinsic motivation

论文评审过程:Received 1 January 2021, Revised 12 April 2021, Accepted 5 May 2021, Available online 8 May 2021, Version of Record 13 May 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.107128