Fast Online Q(λ)
作者:Marco Wiering, Jürgen Schmidhuber
摘要
Q(λ)-learning uses TD(λ)-methods to accelerate Q-learning. The update complexity of previous online Q(λ) implementations based on lookup tables is bounded by the size of the state/action space. Our faster algorithm's update complexity is bounded by the number of actions. The method is based on the observation that Q-value updates may be postponed until they are needed.
论文关键词:reinforcement learning, Q-learning, TD(λ), online Q(λ), lazy learning
论文评审过程:
论文官网地址:https://doi.org/10.1023/A:1007562800292