Bandit-Based Genetic Programming with Application to Reinforcement Learning

J.-B Hoock 1, 2 O Teytaud 1, 2, *
* Auteur correspondant
2 TAO - Machine Learning and Optimisation
CNRS - Centre National de la Recherche Scientifique : UMR8623, Inria Saclay - Ile de France, UP11 - Université Paris-Sud - Paris 11, LRI - Laboratoire de Recherche en Informatique
Abstract : When looking for relevant mutations of a learning program, a main trouble is that evaluating a mutation is noisy; we can have a precise estimate of a mutation, if we test it many times, but this is quite expensive; or we can have a rough estimate, which is much faster. This is a load balancing problem: on which mutations should we spend more effort ? Bandit algorithms have been used for this load balancing: they choose the com-putational effort spent on various possible mutations, depending on the current estimate of the quality of a mutation and on the precision of this estimate. How-ever, in many cases, we want to validate some possible mutations; when should we stop the bandit mutation, and analyze new mutations ? Racing algorithms are aimed at combining the load balancing and the statistical validation; we here mathematically analyze and experiment racing algorithms in the context of mu-tations of programs, i.e. genetic programming. As an application, we consider Monte-Carlo Tree Search. Monte-Carlo Tree Search is a recent very successful algorithm for reinforcement learning, success-fully applied in games and Markov decision Processes. We consider the valida-tion of randomly generated patterns in a Monte-Carlo Tree Search program. Our bandit-based genetic programming (BGP) algorithm, with proved mathematical properties, outperformed a highly optimized handcrafted module of a well-known computer-Go program with several world records in the game of Go.
Type de document :
Communication dans un congrès
Conférence Francophone d'Apprentissage 2010, May 2010, Clermont-Ferrand, France
Liste complète des métadonnées

Littérature citée [21 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01098456
Contributeur : Jean-Baptiste Hoock <>
Soumis le : mercredi 24 décembre 2014 - 18:47:31
Dernière modification le : jeudi 11 janvier 2018 - 06:22:14
Document(s) archivé(s) le : mercredi 25 mars 2015 - 10:17:15

Fichier

bgp_cap10.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01098456, version 1

Citation

J.-B Hoock, O Teytaud. Bandit-Based Genetic Programming with Application to Reinforcement Learning. Conférence Francophone d'Apprentissage 2010, May 2010, Clermont-Ferrand, France. 〈hal-01098456〉

Partager

Métriques

Consultations de la notice

127

Téléchargements de fichiers

93