Bandit Algorithms for Tree Search

Pierre-Arnaud Coquelin 1 Rémi Munos 1
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : Bandit based methods for tree search have recently gained popularity when applied to huge trees, e.g. in the game of go [Gelly et al., 2006]. Their efficient exploration of the tree enables to return rapidly a good value, and improve precision if more time is provided. The UCT algorithm [Kocsis and Szepesvari, 2006], a tree search method based on Upper Confidence Bounds (UCB) [Auer et al. 2002], is believed to adapt locally to the effective smoothness of the tree. However, we show that UCT is ''over-optimistic'' in some sense, leading to a worst-case regret that may be very poor. We propose alternative bandit algorithms for tree search. First, a modification of UCT using a confidence sequence that scales exponentially in the horizon depth is analyzed. We then consider Flat-UCB performed on the leaves and provide a finite regret bound with high probability. Then, we introduce and analyze a Bandit Algorithm for Smooth Trees (BAST) which takes into account actual smoothness of the rewards for performing efficient ''cuts'' of sub-optimal branches with high confidence. Finally, we present an incremental tree expansion which applies when the full tree is too big (possibly infinite) to be entirely represented and show that with high probability, only the optimal branches are indefinitely developed. We illustrate these methods on a global optimization problem of a continuous function, given noisy values.
Type de document :
Communication dans un congrès
Uncertainty in Artificial Intelligence, 2007, Vancouver, Canada. 2007
Liste complète des métadonnées

Littérature citée [8 références]  Voir  Masquer  Télécharger
Contributeur : Rémi Munos <>
Soumis le : mardi 29 mai 2007 - 17:30:52
Dernière modification le : jeudi 11 janvier 2018 - 06:22:13
Document(s) archivé(s) le : jeudi 8 avril 2010 - 18:16:55


Fichiers produits par l'(les) auteur(s)


  • HAL Id : inria-00150207, version 1



Pierre-Arnaud Coquelin, Rémi Munos. Bandit Algorithms for Tree Search. Uncertainty in Artificial Intelligence, 2007, Vancouver, Canada. 2007. 〈inria-00150207〉



Consultations de la notice


Téléchargements de fichiers