Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes
作者:Vladislav B. Tadić
摘要
The mean-square asymptotic behavior of temporal-difference learning algorithms with constant step-sizes and linear function approximation is analyzed in this paper. The analysis is carried out for the case of discounted cost function associated with a Markov chain with a finite dimensional state-space. Under mild conditions, an upper bound for the asymptotic mean-square error of these algorithms is determined as a function of the step-size. Moreover, under the same assumptions, it is also shown that this bound is linear in the step size. The main results of the paper are illustrated with examples related to M/G/1 queues and nonlinear AR models with Markov switching.
论文关键词:Temporal-difference learning, Neuro-dynamic programming, Reinforcement learning, Stochastic approximation, Markov chains
论文评审过程:
论文官网地址:https://doi.org/10.1007/s10994-006-5835-z