Point-based online value iteration algorithm in large POMDP

作者：Bo Wu, Hong-Yan Zheng, Yan-Peng Feng

摘要

Partially observable Markov decision process (POMDP) is an ideal framework for sequential decision-making under uncertainty in stochastic domains. However, it is notoriously computationally intractable to solving POMDP in real-time system. In order to address this problem, this paper proposes a point-based online value iteration (PBOVI) algorithm which involves performing value backup at specific reachable belief points, rather than over the entire belief simplex, to speed up computation processes, exploits branch-and-bound pruning approach to prune the AND/OR tree of belief states online, and proposes a novel idea to reuse the belief states that have been searched to avoid repeated computation. The experiment and simulation results show that the proposed algorithm can simultaneously satisfy the requirement of low errors and high timeliness in real-time system.

论文关键词：POMDP, Belief states, Point-based value iteration, Online, AND/OR tree

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10489-013-0479-8