Incremental reinforcement learning for multi-objective robotic tasks

作者：Javier García, Roberto Iglesias, Miguel A. Rodríguez, Carlos V. Regueiro

摘要

Recently reinforcement learning has been widely applied to robotic tasks. However, most of these tasks hide more than one objective. In these cases, the construction of a reward function is a key and difficult issue. A typical solution is combining the multiple objectives into one single-objective reward function. However, quite often this formulation is far from being intuitive, and the learning process might converge to a behaviour far from what we need. Another alternative to face these multi-objective tasks is to use what is called transfer learning. In this case, the idea is to reuse the experience gained after the learning of an objective to learn a new one. Nevertheless, the transfer affects only to the learned policy, leaving out other gained information that might be relevant. In this paper, we propose a different approach to learn problems with more than one objective. In particular, we describe a two-stage approach. During the first stage, our algorithm will learn a policy compatible with a main goal at the same time that it gathers relevant information for a subsequent search process. Once this is done, a second stage will start, which consists of a cyclical process of small perturbations and stabilizations, and which tries to avoid degrading the performance of the system while it searches for a new valid policy but that also optimizes a sub-objective. We have applied our proposal for the learning of the biped walking. We have tested it on a humanoid robot, both on simulation and on a real robot.

论文关键词：Reinforcement learning, Multi-objective optimization, Robotic tasks, Policy search

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10115-016-0992-2