Layered Relative Entropy Policy Search

作者:

Highlights:

摘要

Many reinforcement learning problems are hierarchical in nature. Exploiting this property can ease the learning process. In this paper, a hierarchical policy search method based on clustering is presented. We use a hierarchical policy which is composed of a high-level gating policy and a set of low-level sub-policies. Depending on the observed state, the gating policy chooses one of the sub-policies for action selection. The gating policy and each sub-policy is provided with a learning method to update their parameters. If a task is truly hierarchical, its state–action space must have meaningful clusters. We cluster the observed samples and use each cluster to update one sub-policy. This way, each sub-policy is adapted to some portion of the state–action space. Using the updated sub-policy probability density functions and observed samples, the gating policy is updated. We evaluate our method on three multimodal tasks as well as a simulated and real robotic manipulation task. Our experiments show that our method can discover versatile sub-policies for multimodal tasks and the manipulation task. Moreover, we point out the errors in some of the equations of Hierarchical Relative Entropy Policy Search paper and provide the necessary corrections.

论文关键词:Hierarchical policy search,Reinforcement learning,Kullback–Leibler divergence,Clustering

论文评审过程:Received 25 December 2019, Revised 3 April 2021, Accepted 6 April 2021, Available online 17 April 2021, Version of Record 17 April 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.107025