Skip to Main content Skip to Navigation
Conference papers

Bandit-Based Genetic Programming with Application to Reinforcement Learning

J.-B Hoock 1, 2 O Teytaud 1, 2, *
* Corresponding author
2 TAO - Machine Learning and Optimisation
CNRS - Centre National de la Recherche Scientifique : UMR8623, Inria Saclay - Ile de France, UP11 - Université Paris-Sud - Paris 11, LRI - Laboratoire de Recherche en Informatique
Abstract : When looking for relevant mutations of a learning program, a main trouble is that evaluating a mutation is noisy; we can have a precise estimate of a mutation, if we test it many times, but this is quite expensive; or we can have a rough estimate, which is much faster. This is a load balancing problem: on which mutations should we spend more effort ? Bandit algorithms have been used for this load balancing: they choose the com-putational effort spent on various possible mutations, depending on the current estimate of the quality of a mutation and on the precision of this estimate. How-ever, in many cases, we want to validate some possible mutations; when should we stop the bandit mutation, and analyze new mutations ? Racing algorithms are aimed at combining the load balancing and the statistical validation; we here mathematically analyze and experiment racing algorithms in the context of mu-tations of programs, i.e. genetic programming. As an application, we consider Monte-Carlo Tree Search. Monte-Carlo Tree Search is a recent very successful algorithm for reinforcement learning, success-fully applied in games and Markov decision Processes. We consider the valida-tion of randomly generated patterns in a Monte-Carlo Tree Search program. Our bandit-based genetic programming (BGP) algorithm, with proved mathematical properties, outperformed a highly optimized handcrafted module of a well-known computer-Go program with several world records in the game of Go.
Document type :
Conference papers
Complete list of metadatas

Cited literature [21 references]  Display  Hide  Download
Contributor : Jean-Baptiste Hoock <>
Submitted on : Wednesday, December 24, 2014 - 6:47:31 PM
Last modification on : Wednesday, October 14, 2020 - 3:41:42 AM
Long-term archiving on: : Wednesday, March 25, 2015 - 10:17:15 AM


Files produced by the author(s)


  • HAL Id : hal-01098456, version 1



J.-B Hoock, O Teytaud. Bandit-Based Genetic Programming with Application to Reinforcement Learning. Conférence Francophone d'Apprentissage 2010, May 2010, Clermont-Ferrand, France. ⟨hal-01098456⟩



Record views


Files downloads