Depth-based short-sighted stochastic shortest path problems

摘要

Stochastic Shortest Path Problems (SSPs) are a common representation for probabilistic planning problems. Two approaches can be used to solve SSPs: (i) consider all probabilistically reachable states and (ii) plan only for a subset of these reachable states. Closed policies, the solutions obtained in the former approach, require significant computational effort, and they do not require replanning, i.e., the planner is never re-invoked. The second approach, employed by replanners, computes open policies, i.e., policies for a subset of the probabilistically reachable states. Therefore, when a state is reached in which the open policy is not defined, the replanner is reinvoked to compute a new open policy. In this article, we introduce a special case of SSPs, the depth-based short-sighted SSPs, in which every state has a nonzero probability of being reached using at most t actions. We also introduce the novel algorithm Short-Sighted Probabilistic Planner (SSiPP), which solves SSPs through depth-based short-sighted SSPs and guarantees that at least t actions can be executed without replanning. Therefore, SSiPP can compute both open and closed policies: as t increases, the returned policy approaches the behavior of a closed policy, and for t large enough, the returned policy is closed. Moreover, we present two extensions to SSiPP: Labeled-SSiPP and SSiPP-FF. The former extension incorporates a labeling mechanism to avoid revisiting states that have already converged. The latter extension combines SSiPP and determinizations to improve the performance of SSiPP in problems without dead ends. We also performed an extensive empirical evaluation of SSiPP and its extensions in several problems against state-of-the-art planners. The results show that (i) Labeled-SSiPP outperforms SSiPP and the considered planners in the task of finding the optimal solution when the problems have a low percentage of relevant states; and (ii) SSiPP-FF outperforms SSiPP in the task of quickly finding suboptimal solutions to problems without dead ends while performing similarly in problems with dead ends.