Learning and planning in environments with delayed feedback

作者:Thomas J. Walsh, Ali Nouri, Lihong Li, Michael L. Littman

摘要

This work considers the problems of learning and planning in Markovian environments with constant observation and reward delays. We provide a hardness result for the general planning problem and positive results for several special cases with deterministic or otherwise constrained dynamics. We present an algorithm, Model Based Simulation, for planning in such environments and use model-based reinforcement learning to extend this approach to the learning setting in both finite and continuous environments. Empirical comparisons show this algorithm holds significant advantages over others for decision making in delayed-observation environments.

论文关键词:Reinforcement learning, Delayed feedback, Markov decision processes

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10458-008-9056-7