TD(λ) Converges with Probability 1

作者:Peter Dayan, Terrence J. Sejnowski

摘要

The methods of temporal differences (Samuel, 1959; Sutton, 1984, 1988) allow an agent to learn accurate predictions of stationary stochastic future outcomes. The learning is effectively stochastic approximation based on samples extracted from the process generating the agent's future.

论文关键词:reinforcement learning, temporal differences, \(\mathcal{Q}\)-learning

论文评审过程:

论文官网地址:https://doi.org/10.1023/A:1022657612745