A POMDP Extension with Belief-dependent Rewards - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2010

A POMDP Extension with Belief-dependent Rewards

Mauricio Araya-López
  • Fonction : Auteur
  • PersonId : 881106
Olivier Buffet
Vincent Thomas

Résumé

Partially Observable Markov Decision Processes (POMDPs) model sequential decision-making problems under uncertainty and partial observability. Unfortunately, some problems cannot be modeled with state-dependent reward functions, e.g., problems whose objective explicitly implies reducing the uncertainty on the state. To that end, we introduce ρPOMDPs, an extension of POMDPs where the reward function ρ depends on the belief state. We show that, under the common assumption that ρ is convex, the value function is also convex, what makes it possible to (1) approximate ρ arbitrarily well with a piecewise linear and convex (PWLC) function, and (2) use state-of-the-art exact or approximate solving algorithms with limited changes.
Fichier principal
Vignette du fichier
article.pdf (122.05 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte

Dates et versions

inria-00535560 , version 1 (11-12-2010)
inria-00535560 , version 2 (14-12-2010)

Identifiants

  • HAL Id : inria-00535560 , version 1

Citer

Mauricio Araya-López, Olivier Buffet, Vincent Thomas, François Charpillet. A POMDP Extension with Belief-dependent Rewards. Neural Information Processing Systems - NIPS 2010, Dec 2010, Vancouver, Canada. ⟨inria-00535560v1⟩
450 Consultations
553 Téléchargements

Partager

Gmail Facebook X LinkedIn More