A POMDP Extension with Belief-dependent Rewards

Mauricio Araya-López; Olivier Buffet; Vincent Thomas; François Charpillet

Communication Dans Un Congrès Année : 2010

A POMDP Extension with Belief-dependent Rewards

(1) , (1) , (1) , (1)

Mauricio Araya-López

Fonction : Auteur
PersonId : 881106

Autonomous intelligent machine

Olivier Buffet

Fonction : Auteur
PersonId : 1407
IdHAL : olivier-buffet
ORCID : 0000-0002-5072-5857

Autonomous intelligent machine

Vincent Thomas

Fonction : Auteur
PersonId : 16368
IdHAL : vincent-thomas
ORCID : 0000-0003-3401-4649

Autonomous intelligent machine

François Charpillet

Fonction : Auteur
PersonId : 1910
IdHAL : francois-charpillet
ORCID : 0000-0001-8260-1536
IdRef : 070140553

Autonomous intelligent machine

Résumé

Partially Observable Markov Decision Processes (POMDPs) model sequential decision-making problems under uncertainty and partial observability. Unfortunately, some problems cannot be modeled with state-dependent reward functions, e.g., problems whose objective explicitly implies reducing the uncertainty on the state. To that end, we introduce ρPOMDPs, an extension of POMDPs where the reward function ρ depends on the belief state. We show that, under the common assumption that ρ is convex, the value function is also convex, what makes it possible to (1) approximate ρ arbitrarily well with a piecewise linear and convex (PWLC) function, and (2) use state-of-the-art exact or approximate solving algorithms with limited changes.

Mots clés

Partially Observable Markov Decision Processes reward function active sensing piecewise linear and convex approximation

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

article.pdf (122.05 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Olivier Buffet : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00535560

Soumis le : samedi 11 décembre 2010-07:00:05

Dernière modification le : vendredi 24 mars 2023-14:52:53

Archivage à long terme le : lundi 5 novembre 2012-13:16:11

Dates et versions

inria-00535560 , version 1 (11-12-2010)

inria-00535560 , version 2 (14-12-2010)

Identifiants

HAL Id : inria-00535560 , version 1

Citer

Mauricio Araya-López, Olivier Buffet, Vincent Thomas, François Charpillet. A POMDP Extension with Belief-dependent Rewards. Neural Information Processing Systems - NIPS 2010, Dec 2010, Vancouver, Canada. ⟨inria-00535560v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

450 Consultations

553 Téléchargements

A POMDP Extension with Belief-dependent Rewards

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Partager