An analysis of model-based Interval Estimation for Markov Decision Processes
作者:
Highlights:
•
摘要
Several algorithms for learning near-optimal policies in Markov Decision Processes have been analyzed and proven efficient. Empirical results have suggested that Model-based Interval Estimation (MBIE) learns efficiently in practice, effectively balancing exploration and exploitation. This paper presents a theoretical analysis of MBIE and a new variation called MBIE-EB, proving their efficiency even under worst-case conditions. The paper also introduces a new performance metric, average loss, and relates it to its less “online” cousins from the literature.
论文关键词:Reinforcement learning,Learning theory,Markov Decision Processes
论文评审过程:Received 22 August 2007, Available online 19 September 2008.
论文官网地址:https://doi.org/10.1016/j.jcss.2007.08.009