Preference-based reinforcement learning: a formal framework and a policy iteration algorithm

作者：Johannes Fürnkranz, Eyke Hüllermeier, Weiwei Cheng, Sang-Hyeun Park

摘要

This paper makes a first step toward the integration of two subfields of machine learning, namely preference learning and reinforcement learning (RL). An important motivation for a preference-based approach to reinforcement learning is the observation that in many real-world domains, numerical feedback signals are not readily available, or are defined arbitrarily in order to satisfy the needs of conventional RL algorithms. Instead, we propose an alternative framework for reinforcement learning, in which qualitative reward signals can be directly used by the learner. The framework may be viewed as a generalization of the conventional RL framework in which only a partial order between policies is required instead of the total order induced by their respective expected long-term reward.

论文关键词：Reinforcement learning, Preference learning

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10994-012-5313-8