Additional planning with multiple objectives for reinforcement learning

作者:

Highlights:

摘要

Most control tasks have multiple objectives that need to be achieved simultaneously, while the reward definition is the weighted combination of all objects to determine one optimal policy. This configuration has a limitation in exploration flexibility and presents difficulty in reaching a satisfied terminate condition. Although some multi-objective reinforcement learning (MORL) methods have been presented recently, they concentrate on obtaining a set of compromising options rather than one best-performed strategy. On the other hand, the existing policy-improve methods have rarely emphasized on solving multiple objectives circumstances. Inspired by the enhanced policy search methods, an additional planning technique with multiple objectives for reinforcement learning is proposed in this paper, which is denoted as RLAP-MOP. This method provides opportunities to evaluate parallel requirements at the same time and suggests several optimal feasible actions to improve long-term performance further. Meanwhile, the short-term planning adopted in this paper has advantages in maintaining safe trajectories and building more accurate approximate models, which contributes to accelerating the training program. For comparison, an RLAP with single-objective optimization is also introduced in theoretical and experimental studies. The proposed techniques are investigated on a multi-objective cartpole environment and a soft robotic palpation task. The superiorities in the improved return values and learning stability prove that the multiple objectives based additional planning is a promising assistant to improve reinforcement learning.

论文关键词:Reinforcement learning,Multi-objective,Robotic control

论文评审过程:Received 20 December 2018, Revised 12 December 2019, Accepted 13 December 2019, Available online 19 December 2019, Version of Record 7 March 2020.

论文官网地址:https://doi.org/10.1016/j.knosys.2019.105392