The Convergence of TD(λ) for General λ
作者:Peter Dayan
摘要
The method of temporal differences (TD) is one way of making consistent predictions about the future. This paper uses some analysis of Watkins (1989) to extend a convergence theorem due to Sutton (1988) from the case which only uses information from adjacent time steps to that involving information from arbitrary ones.
论文关键词:Reinforcement learning, temporal differences, asynchronous dynamic programming
论文评审过程:
论文官网地址:https://doi.org/10.1023/A:1022632907294