Skip to Main content Skip to Navigation
Conference papers

A POMDP Extension with Belief-dependent Rewards

Mauricio Araya-López 1 Olivier Buffet 1 Vincent Thomas 1 François Charpillet 1
1 MAIA - Autonomous intelligent machine
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : Partially Observable Markov Decision Processes (POMDPs) model sequential decision-making problems under uncertainty and partial observability. Unfortunately, some problems cannot be modeled with state-dependent reward functions, e.g., problems whose objective explicitly implies reducing the uncertainty on the state. To that end, we introduce ρPOMDPs, an extension of POMDPs where the reward function ρ depends on the belief state. We show that, under the common assumption that ρ is convex, the value function is also convex, what makes it possible to (1) approximate ρ arbitrarily well with a piecewise linear and convex (PWLC) function, and (2) use state-of-the-art exact or approximate solving algorithms with limited changes.
Document type :
Conference papers
Complete list of metadata

Cited literature [20 references]  Display  Hide  Download

https://hal.inria.fr/inria-00535560
Contributor : Olivier Buffet <>
Submitted on : Tuesday, December 14, 2010 - 4:49:45 PM
Last modification on : Friday, February 26, 2021 - 3:28:05 PM
Long-term archiving on: : Saturday, December 3, 2016 - 12:34:22 AM

File

article.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : inria-00535560, version 2

Collections

Citation

Mauricio Araya-López, Olivier Buffet, Vincent Thomas, François Charpillet. A POMDP Extension with Belief-dependent Rewards. Neural Information Processing Systems - NIPS 2010, Dec 2010, Vancouver, Canada. ⟨inria-00535560v2⟩

Share

Metrics

Record views

658

Files downloads

732