Skip to Main content Skip to Navigation
Conference papers

Learning Exploration Strategies in Model-Based Reinforcement Learning

Todd Hester 1 Peter Stone 1 Manuel Lopes 2
2 Flowers - Flowing Epigenetic Robots and Systems
Inria Bordeaux - Sud-Ouest, U2IS - Unité d'Informatique et d'Ingénierie des Systèmes
Abstract : Reinforcement learning (RL) is a paradigm for learning sequential decision making tasks. However, typically the user must hand-tune exploration parameters for each different domain and/or algorithm that they are using. In this work, we present an algorithm called leo for learning these exploration strategies on-line. This algorithm makes use of bandit-type algorithms to adaptively select exploration strategies based on the rewards received when following them. We show empirically that this method performs well across a set of five domains. In contrast, for a given algorithm, no set of parameters is best across all domains. Our results demonstrate that the leo algorithm successfully learns the best exploration strategies on-line, increasing the received reward over static parameterizations of exploration and reducing the need for hand-tuning exploration parameters.
Document type :
Conference papers
Complete list of metadata
Contributor : Manuel Lopes Connect in order to contact the contributor
Submitted on : Thursday, October 10, 2013 - 5:38:29 PM
Last modification on : Friday, December 3, 2021 - 11:34:06 AM


  • HAL Id : hal-00871861, version 1



Todd Hester, Peter Stone, Manuel Lopes. Learning Exploration Strategies in Model-Based Reinforcement Learning. AAMAS 2013 - 12th International Conference on Autonomous Agents and Multiagent Systems, May 2013, St. Paul, MN, United States. pp.1069-1076. ⟨hal-00871861⟩



Les métriques sont temporairement indisponibles