Skip to Main content Skip to Navigation
Conference papers

Continuous Upper Confidence Trees with Polynomial Exploration - Consistency

David Auger 1 Adrien Couetoux 2 Olivier Teytaud 2, 3
1 AlCAAP - Algorithmique, Combinatoire Analytique et Applications
PRISM - Parallélisme, Réseaux, Systèmes, Modélisation
3 TAO - Machine Learning and Optimisation
CNRS - Centre National de la Recherche Scientifique : UMR8623, Inria Saclay - Ile de France, UP11 - Université Paris-Sud - Paris 11, LRI - Laboratoire de Recherche en Informatique
Abstract : Upper Confidence Trees (UCT) are now a well known algorithm for sequential decision making; it is a provably consistent variant of Monte-Carlo Tree Search. However, the consistency is only proved in a the case where both the action space is finite. We here propose a proof in the case of fully observable Markov Decision Processes with bounded horizon, possibly including infinitely many states and infinite action spaces and arbitrary stochastic transition kernels. We illustrate the consistency on two benchmark problems, one being a legacy toy problem, the other a more challenging one, the famous energy unit commitment problem.
Complete list of metadata

Cited literature [14 references]  Display  Hide  Download
Contributor : Adrien Couetoux Connect in order to contact the contributor
Submitted on : Tuesday, June 18, 2013 - 3:26:31 PM
Last modification on : Thursday, January 20, 2022 - 5:33:21 PM
Long-term archiving on: : Thursday, September 19, 2013 - 4:11:19 AM


Files produced by the author(s)


  • HAL Id : hal-00835352, version 1



David Auger, Adrien Couetoux, Olivier Teytaud. Continuous Upper Confidence Trees with Polynomial Exploration - Consistency. ECML/PKKD 2013, Sep 2013, Prague, Czech Republic. pp.194-209. ⟨hal-00835352⟩



Les métriques sont temporairement indisponibles