3532 articles – 5253 references  [version française]

hal-00642909, version 1

Active Learning of MDP Models

Mauricio Araya-López () a1, Olivier Buffet (, http://www.loria.fr/~buffet/) 1, Vincent Thomas () b1, François Charpillet (, http://www.loria.fr/~charp/) a1

European Workshop On Reinforcement Learning (2011)

Abstract: We consider the active learning problem of inferring the transition model of a Markov Decision Process by acting and observ- ing transitions. This is particularly useful when no reward function is a priori defined. Our proposal is to cast the active learning task as a utility maximization problem using Bayesian reinforcement learning with belief-dependent rewards. After presenting three possible performance criteria, we derive from them the belief-dependent rewards to be used in the decision-making process. As computing the optimal Bayesian value function is intractable for large horizons, we use a simple algorithm to approximately solve this optimization problem. Despite the sub-optimality of this technique, we show experimentally that our proposal is efficient in a number of domains.

  • a –  INRIA
  • b –  Université Nancy II
  • 1:  MAIA (INRIA Lorraine - LORIA)
  • INRIA – CNRS : UMR7503 – Université Henri Poincaré - Nancy I – Université Nancy II – Institut National Polytechnique de Lorraine (INPL)
  • Domain : Computer Science/Artificial Intelligence
 
  • hal-00642909, version 1
  • oai:hal.inria.fr:hal-00642909
  • From: 
  • Submitted on: Saturday, 19 November 2011 16:21:46
  • Updated on: Saturday, 19 November 2011 18:29:39