Q-learning

作者：Christopher J. C. H. Watkins, Peter Dayan

摘要

Q-learning (Watkins, 1989) is a simple way for agents to learn how to act optimally in controlled Markovian domains. It amounts to an incremental method for dynamic programming which imposes limited computational demands. It works by successively improving its evaluations of the quality of particular actions at particular states.

论文关键词：Q-learning, reinforcement learning, temporal differences, asynchronous dynamic programming

论文评审过程：

论文官网地址：https://doi.org/10.1007/BF00992698

原文链接
谷歌学术
必应学术
百度学术