Skip to Main content Skip to Navigation
New interface
Conference papers

A POMDP Extension with Belief-dependent Rewards

Mauricio Araya-López 1 Olivier Buffet 1 Vincent Thomas 1 François Charpillet 1 
1 MAIA - Autonomous intelligent machine
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : Partially Observable Markov Decision Processes (POMDPs) model sequential decision-making problems under uncertainty and partial observability. Unfortunately, some problems cannot be modeled with state-dependent reward functions, e.g., problems whose objective explicitly implies reducing the uncertainty on the state. To that end, we introduce ρPOMDPs, an extension of POMDPs where the reward function ρ depends on the belief state. We show that, under the common assumption that ρ is convex, the value function is also convex, what makes it possible to (1) approximate ρ arbitrarily well with a piecewise linear and convex (PWLC) function, and (2) use state-of-the-art exact or approximate solving algorithms with limited changes.
Document type :
Conference papers
Complete list of metadata

Cited literature [20 references]  Display  Hide  Download
Contributor : Olivier Buffet Connect in order to contact the contributor
Submitted on : Tuesday, December 14, 2010 - 4:49:45 PM
Last modification on : Saturday, June 25, 2022 - 7:44:43 PM
Long-term archiving on: : Saturday, December 3, 2016 - 12:34:22 AM


Publisher files allowed on an open archive


  • HAL Id : inria-00535560, version 2



Mauricio Araya-López, Olivier Buffet, Vincent Thomas, François Charpillet. A POMDP Extension with Belief-dependent Rewards. Neural Information Processing Systems - NIPS 2010, Dec 2010, Vancouver, Canada. ⟨inria-00535560v2⟩



Record views


Files downloads