inria-00529498, version 1
A POMDP Extension with Belief-dependent Rewards (Extended Version)
N° RR-7433 (2010)
Abstract: Partially Observable Markov Decision Processes (POMDPs) model sequential decision-making problems under uncertainty and partial observability. Unfortunately, some problems cannot be modeled with state-dependent reward functions, e.g., problems whose objective explicitly implies reducing the uncertainty on the state. To that end, we introduce ρPOMDPs, an extension of POMDPs where the reward function ρ depends on the belief state. We show that, under the common assumption that ρ is convex, the value function is also convex, what makes it possible to (1) approximate ρ arbitrarily well with a piecewise linear and convex (PWLC) function, and (2) use state-of-the-art exact or approximate solving algorithms with limited changes.
- a – INRIA
- b – Université Nancy II
- 1:
- INRIA – CNRS : UMR7503 – Université Henri Poincaré - Nancy I – Université Nancy II – Institut National Polytechnique de Lorraine (INPL)
- Domain : Computer Science/Artificial Intelligence
- Keywords : partially observable Markov decision processes – reward function – active sensing – piecewise linear and convex approximation
- Internal note : RR-7433
- Available versions : v1 (2010-10-26) v2 (2010-12-15)
- inria-00529498, version 1
- http://hal.inria.fr/inria-00529498
- oai:hal.inria.fr:inria-00529498
- From:
- Submitted on: Monday, 25 October 2010 16:59:22
- Updated on: Thursday, 11 November 2010 20:20:07



Associated documents
Export