Learning Exploration Strategies in Model-Based Reinforcement Learning

Todd Hester 1 Peter Stone 1 Manuel Lopes 2
2 Flowers - Flowing Epigenetic Robots and Systems
Inria Bordeaux - Sud-Ouest, U2IS - Unité d'Informatique et d'Ingénierie des Systèmes
Abstract : Reinforcement learning (RL) is a paradigm for learning sequential decision making tasks. However, typically the user must hand-tune exploration parameters for each different domain and/or algorithm that they are using. In this work, we present an algorithm called leo for learning these exploration strategies on-line. This algorithm makes use of bandit-type algorithms to adaptively select exploration strategies based on the rewards received when following them. We show empirically that this method performs well across a set of five domains. In contrast, for a given algorithm, no set of parameters is best across all domains. Our results demonstrate that the leo algorithm successfully learns the best exploration strategies on-line, increasing the received reward over static parameterizations of exploration and reducing the need for hand-tuning exploration parameters.
Type de document :
Communication dans un congrès
AAMAS 2013 - 12th International Conference on Autonomous Agents and Multiagent Systems, May 2013, St. Paul, MN, United States. ACM, AAMAS '13 Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems, pp.1069-1076, 2013, 〈http://www.ifaamas.org/Proceedings/aamas2013/docs/p1069.pdf〉
Liste complète des métadonnées

https://hal.inria.fr/hal-00871861
Contributeur : Manuel Lopes <>
Soumis le : jeudi 10 octobre 2013 - 17:38:29
Dernière modification le : jeudi 12 avril 2018 - 15:36:34

Identifiants

  • HAL Id : hal-00871861, version 1

Collections

Citation

Todd Hester, Peter Stone, Manuel Lopes. Learning Exploration Strategies in Model-Based Reinforcement Learning. AAMAS 2013 - 12th International Conference on Autonomous Agents and Multiagent Systems, May 2013, St. Paul, MN, United States. ACM, AAMAS '13 Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems, pp.1069-1076, 2013, 〈http://www.ifaamas.org/Proceedings/aamas2013/docs/p1069.pdf〉. 〈hal-00871861〉

Partager

Métriques

Consultations de la notice

291