Bandit-Based Genetic Programming with Application to Reinforcement Learning - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2010

Bandit-Based Genetic Programming with Application to Reinforcement Learning

Résumé

When looking for relevant mutations of a learning program, a main trouble is that evaluating a mutation is noisy; we can have a precise estimate of a mutation, if we test it many times, but this is quite expensive; or we can have a rough estimate, which is much faster. This is a load balancing problem: on which mutations should we spend more effort ? Bandit algorithms have been used for this load balancing: they choose the com-putational effort spent on various possible mutations, depending on the current estimate of the quality of a mutation and on the precision of this estimate. How-ever, in many cases, we want to validate some possible mutations; when should we stop the bandit mutation, and analyze new mutations ? Racing algorithms are aimed at combining the load balancing and the statistical validation; we here mathematically analyze and experiment racing algorithms in the context of mu-tations of programs, i.e. genetic programming. As an application, we consider Monte-Carlo Tree Search. Monte-Carlo Tree Search is a recent very successful algorithm for reinforcement learning, success-fully applied in games and Markov decision Processes. We consider the valida-tion of randomly generated patterns in a Monte-Carlo Tree Search program. Our bandit-based genetic programming (BGP) algorithm, with proved mathematical properties, outperformed a highly optimized handcrafted module of a well-known computer-Go program with several world records in the game of Go.
Fichier principal
Vignette du fichier
bgp_cap10.pdf (169.56 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01098456 , version 1 (24-12-2014)

Identifiants

  • HAL Id : hal-01098456 , version 1

Citer

J.-B Hoock, O Teytaud. Bandit-Based Genetic Programming with Application to Reinforcement Learning. Conférence Francophone d'Apprentissage 2010, May 2010, Clermont-Ferrand, France. ⟨hal-01098456⟩
214 Consultations
109 Téléchargements

Partager

Gmail Facebook X LinkedIn More