Model primitives for hierarchical lifelong reinforcement learning

作者:Bohan Wu, Jayesh K. Gupta, Mykel Kochenderfer

摘要

Learning interpretable and transferable subpolicies and performing task decomposition from a single, complex task is difficult. Such decomposition can lead to immense sample efficiency gains in lifelong learning. Some traditional hierarchical reinforcement learning techniques enforce this decomposition in a top-down manner, while meta-learning techniques require a task distribution at hand to learn such decompositions. This article presents a framework for using diverse suboptimal world models to decompose complex task solutions into simpler modular subpolicies. Given these world models, this framework performs decomposition of a single source task in a bottom up manner, concurrently learning the required modular subpolicies as well as a controller to coordinate them. We perform a series of experiments on high dimensional continuous action control tasks to demonstrate the effectiveness of this approach at both complex single-task learning and lifelong learning. Finally, we perform ablation studies to understand the importance and robustness of different elements in the framework and limitations to this approach.

论文关键词:Reinforcement learning, Task decomposition, Transfer, Lifelong learning, Hierarchical learning

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10458-020-09451-0