Model primitives for hierarchical lifelong reinforcement learning

作者：Bohan Wu, Jayesh K. Gupta, Mykel Kochenderfer

摘要

Learning interpretable and transferable subpolicies and performing task decomposition from a single, complex task is difficult. Such decomposition can lead to immense sample efficiency gains in lifelong learning. Some traditional hierarchical reinforcement learning techniques enforce this decomposition in a top-down manner, while meta-learning techniques require a task distribution at hand to learn such decompositions. This article presents a framework for using diverse suboptimal world models to decompose complex task solutions into simpler modular subpolicies. Given these world models, this framework performs decomposition of a single source task in a bottom up manner, concurrently learning the required modular subpolicies as well as a controller to coordinate them. We perform a series of experiments on high dimensional continuous action control tasks to demonstrate the effectiveness of this approach at both complex single-task learning and lifelong learning. Finally, we perform ablation studies to understand the importance and robustness of different elements in the framework and limitations to this approach.

论文关键词：Reinforcement learning, Task decomposition, Transfer, Lifelong learning, Hierarchical learning

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10458-020-09451-0