Active Learning of MDP Models

Mauricio Araya-López 1 Olivier Buffet 1 Vincent Thomas 1 François Charpillet 1
1 MAIA - Autonomous intelligent machine
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : We consider the active learning problem of inferring the transition model of a Markov Decision Process by acting and observ- ing transitions. This is particularly useful when no reward function is a priori defined. Our proposal is to cast the active learning task as a utility maximization problem using Bayesian reinforcement learning with belief-dependent rewards. After presenting three possible performance criteria, we derive from them the belief-dependent rewards to be used in the decision-making process. As computing the optimal Bayesian value function is intractable for large horizons, we use a simple algorithm to approximately solve this optimization problem. Despite the sub-optimality of this technique, we show experimentally that our proposal is efficient in a number of domains.
Type de document :
Communication dans un congrès
European Workshop On Reinforcement Learning, Sep 2011, Athène, Greece. 2011
Liste complète des métadonnées

Littérature citée [19 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00642909
Contributeur : Olivier Buffet <>
Soumis le : samedi 19 novembre 2011 - 16:21:46
Dernière modification le : jeudi 11 janvier 2018 - 06:19:51
Document(s) archivé(s) le : vendredi 16 novembre 2012 - 11:30:43

Fichier

EWRL-article.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00642909, version 1

Collections

Citation

Mauricio Araya-López, Olivier Buffet, Vincent Thomas, François Charpillet. Active Learning of MDP Models. European Workshop On Reinforcement Learning, Sep 2011, Athène, Greece. 2011. 〈hal-00642909〉

Partager

Métriques

Consultations de la notice

380

Téléchargements de fichiers

158