hal-00642909, version 1
Active Learning of MDP Models
European Workshop On Reinforcement Learning (2011)
Abstract: We consider the active learning problem of inferring the transition model of a Markov Decision Process by acting and observ- ing transitions. This is particularly useful when no reward function is a priori defined. Our proposal is to cast the active learning task as a utility maximization problem using Bayesian reinforcement learning with belief-dependent rewards. After presenting three possible performance criteria, we derive from them the belief-dependent rewards to be used in the decision-making process. As computing the optimal Bayesian value function is intractable for large horizons, we use a simple algorithm to approximately solve this optimization problem. Despite the sub-optimality of this technique, we show experimentally that our proposal is efficient in a number of domains.
- a – INRIA
- b – Université Nancy II
- 1:
- INRIA – CNRS : UMR7503 – Université Henri Poincaré - Nancy I – Université Nancy II – Institut National Polytechnique de Lorraine (INPL)
- Domain : Computer Science/Artificial Intelligence
- hal-00642909, version 1
- http://hal.inria.fr/hal-00642909
- oai:hal.inria.fr:hal-00642909
- From:
- Submitted on: Saturday, 19 November 2011 16:21:46
- Updated on: Saturday, 19 November 2011 18:29:39



Associated documents
Export