Asynchronous stochastic approximation and Q-learning

作者：John N. Tsitsiklis

摘要

We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Q-learning algorithm, a reinforcement learning method for solving Markov decision problems, and establish its convergence under conditions more general than previously available.

论文关键词：Reinforcement learning, Q-learning, dynamic programming, stochastic approximation

论文评审过程：

论文官网地址：https://doi.org/10.1007/BF00993306

原文链接
谷歌学术
必应学术
百度学术