Exploration bonuses and dual control

作者：Peter Dayan, Terrence J. Sejnowski

摘要

Finding the Bayesian balance between exploration and exploitation in adaptive optimal control is in general intractable. This paper shows how to compute suboptimal estimates based on a certainty equivalence approximation (Cozzolino, Gonzalez-Zubieta & Miller, 1965) arising from a form of dual control. This system-atizes and extends existing uses of exploration bonuses in reinforcement learning (Sutton, 1990). The approach has two components: a statistical model of uncertainty in the world and a way of turning this into exploratory behavior. This general approach is applied to two-dimensional mazes with moveable barriers and its performance is compared with Sutton's DYNA system.

论文关键词：Reinforcement learning, dynamic programming, exploration bonuses, certainty equivalence, non-stationary environment

论文评审过程：

论文官网地址：https://doi.org/10.1007/BF00115298