Continuous Upper Confidence Trees with Polynomial Exploration - Consistency - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2013

Continuous Upper Confidence Trees with Polynomial Exploration - Consistency

Résumé

Upper Confidence Trees (UCT) are now a well known algorithm for sequential decision making; it is a provably consistent variant of Monte-Carlo Tree Search. However, the consistency is only proved in a the case where both the action space is finite. We here propose a proof in the case of fully observable Markov Decision Processes with bounded horizon, possibly including infinitely many states and infinite action spaces and arbitrary stochastic transition kernels. We illustrate the consistency on two benchmark problems, one being a legacy toy problem, the other a more challenging one, the famous energy unit commitment problem.
Fichier principal
Vignette du fichier
doublePWConf.pdf (197.26 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00835352 , version 1 (18-06-2013)

Identifiants

  • HAL Id : hal-00835352 , version 1

Citer

David Auger, Adrien Couetoux, Olivier Teytaud. Continuous Upper Confidence Trees with Polynomial Exploration - Consistency. ECML/PKKD 2013, Sep 2013, Prague, Czech Republic. pp.194-209. ⟨hal-00835352⟩
289 Consultations
1908 Téléchargements

Partager

Gmail Facebook X LinkedIn More