Bandit Algorithms for Tree Search

Pierre-Arnaud Coquelin; Rémi Munos

Communication Dans Un Congrès Année : 2007

Bandit Algorithms for Tree Search

(1) , (1)

Pierre-Arnaud Coquelin

Fonction : Auteur
PersonId : 837679

Sequential Learning

Rémi Munos

Fonction : Auteur
PersonId : 836863

Sequential Learning

Résumé

Bandit based methods for tree search have recently gained popularity when applied to huge trees, e.g. in the game of go [Gelly et al., 2006]. Their efficient exploration of the tree enables to return rapidly a good value, and improve precision if more time is provided. The UCT algorithm [Kocsis and Szepesvari, 2006], a tree search method based on Upper Confidence Bounds (UCB) [Auer et al. 2002], is believed to adapt locally to the effective smoothness of the tree. However, we show that UCT is ''over-optimistic'' in some sense, leading to a worst-case regret that may be very poor. We propose alternative bandit algorithms for tree search. First, a modification of UCT using a confidence sequence that scales exponentially in the horizon depth is analyzed. We then consider Flat-UCB performed on the leaves and provide a finite regret bound with high probability. Then, we introduce and analyze a Bandit Algorithm for Smooth Trees (BAST) which takes into account actual smoothness of the rewards for performing efficient ''cuts'' of sub-optimal branches with high confidence. Finally, we present an incremental tree expansion which applies when the full tree is too big (possibly infinite) to be entirely represented and show that with high probability, only the optimal branches are indefinitely developed. We illustrate these methods on a global optimization problem of a continuous function, given noisy values.

Mots clés

Bandit algorithm Tree search Concentration inequality Stochastic optimization

Domaines

Apprentissage [cs.LG] Optimisation et contrôle [math.OC] Probabilités [math.PR]

Fichier principal

BAST.pdf (230.68 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Rémi Munos : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00150207

Soumis le : mardi 29 mai 2007-17:30:52

Dernière modification le : vendredi 24 mars 2023-14:52:48

Archivage à long terme le : jeudi 8 avril 2010-18:16:55

Dates et versions

inria-00150207 , version 1 (29-05-2007)

Identifiants

HAL Id : inria-00150207 , version 1

Citer

Pierre-Arnaud Coquelin, Rémi Munos. Bandit Algorithms for Tree Search. Uncertainty in Artificial Intelligence, 2007, Vancouver, Canada. ⟨inria-00150207⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LILLE3 CNRS INRIA INSMI LAGIS INRIA2 TDS-MACS

494 Consultations

334 Téléchargements

Bandit Algorithms for Tree Search

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager