3532 articles – 5253 Notices  [english version]

inria-00529498, version 2

A POMDP Extension with Belief-dependent Rewards (Extended Version)

Mauricio Araya-López (Auteur à contacter de préférence) a1, Olivier Buffet () a1, Vincent Thomas () b1, François Charpillet () a1

N° RR-7433 (2010)

Résumé : Partially Observable Markov Decision Processes (POMDPs) model sequential decision-making problems under uncertainty and partial observability. Unfortunately, some problems cannot be modeled with state-dependent reward functions, e.g., problems whose objective explicitly implies reducing the uncertainty on the state. To that end, we introduce ρPOMDPs, an extension of POMDPs where the reward function ρ depends on the belief state. We show that, under the common assumption that ρ is convex, the value function is also convex, what makes it possible to (1) approximate ρ arbitrarily well with a piecewise linear and convex (PWLC) function, and (2) use state-of-the-art exact or approximate solving algorithms with limited changes.

  • a –  INRIA
  • b –  Université Nancy II
  • 1 :  MAIA (INRIA Lorraine - LORIA)
  • INRIA – CNRS : UMR7503 – Université Henri Poincaré - Nancy I – Université Nancy II – Institut National Polytechnique de Lorraine (INPL)
  • Domaine : Informatique/Intelligence artificielle
  • Mots-clés : partially observable Markov decision processes – reward function – active sensing – piecewise linear and convex approximation
  • Référence interne : RR-7433
  • Versions disponibles :  v1 (26-10-2010) v2 (15-12-2010)
 
  • inria-00529498, version 2
  • oai:hal.inria.fr:inria-00529498
  • Contributeur : 
  • Soumis le : Mardi 14 Décembre 2010, 16:48:53
  • Dernière modification le : Mercredi 15 Décembre 2010, 11:02:27