Learning via human feedback in continuous state and action spaces

作者:Ngo Anh Vien, Wolfgang Ertel, Tae Choong Chung

摘要

This paper considers the problem of extending Training an Agent Manually via Evaluative Reinforcement (TAMER) in continuous state and action spaces. Investigative research using the TAMER framework enables a non-technical human to train an agent through a natural form of human feedback (negative or positive). The advantages of TAMER have been shown on tasks of training agents by only human feedback or combining human feedback with environment rewards. However, these methods are originally designed for discrete state-action, or continuous state-discrete action problems. This paper proposes an extension of TAMER to allow both continuous states and actions, called ACTAMER. The new framework utilizes any general function approximation of a human trainer’s feedback signal. Moreover, a combined capability of ACTAMER and reinforcement learning is also investigated and evaluated. The combination of human feedback and reinforcement learning is studied in both settings: sequential and simultaneous. Our experimental results demonstrate the proposed method successfully allowing a human to train an agent in two continuous state-action domains: Mountain Car and Cart-pole (balancing).

论文关键词:Reinforcement learning, Human-agent interaction, Interactive learning, Human teachers, Reward shaping, Continuous states, Continuous actions

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-012-0412-6