Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs

作者:

摘要

Acting in domains where an agent must plan several steps ahead to achieve a goal can be a challenging task, especially if the agentʼs sensors provide only noisy or partial information. In this setting, Partially Observable Markov Decision Processes (POMDPs) provide a planning framework that optimally trades between actions that contribute to the agentʼs knowledge and actions that increase the agentʼs immediate reward. However, the task of specifying the POMDPʼs parameters is often onerous. In particular, setting the immediate rewards to achieve a desired balance between information-gathering and acting is often not intuitive.

论文关键词:Partially observable Markov decision process,Reinforcement learning,Bayesian methods

论文评审过程:Received 16 August 2011, Revised 10 February 2012, Accepted 19 April 2012, Available online 25 April 2012.

论文官网地址:https://doi.org/10.1016/j.artint.2012.04.006