TD(λ) Converges with Probability 1
作者:Peter Dayan, Terrence J. Sejnowski
摘要
The methods of temporal differences (Samuel, 1959; Sutton, 1984, 1988) allow an agent to learn accurate predictions of stationary stochastic future outcomes. The learning is effectively stochastic approximation based on samples extracted from the process generating the agent's future.
论文关键词:reinforcement learning, temporal differences, \(\mathcal{Q}\)-learning
论文评审过程:
论文官网地址:https://doi.org/10.1023/A:1022657612745