Continuous Upper Confidence Trees with Polynomial Exploration - Consistency

David Auger 1 Adrien Couetoux 2 Olivier Teytaud 2, 3
1 AlCAAP - Algorithmique, Combinatoire Analytique et Applications
PRISM - Parallélisme, Réseaux, Systèmes, Modélisation
3 TAO - Machine Learning and Optimisation
LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France, CNRS - Centre National de la Recherche Scientifique : UMR8623
Abstract : Upper Confidence Trees (UCT) are now a well known algorithm for sequential decision making; it is a provably consistent variant of Monte-Carlo Tree Search. However, the consistency is only proved in a the case where both the action space is finite. We here propose a proof in the case of fully observable Markov Decision Processes with bounded horizon, possibly including infinitely many states and infinite action spaces and arbitrary stochastic transition kernels. We illustrate the consistency on two benchmark problems, one being a legacy toy problem, the other a more challenging one, the famous energy unit commitment problem.
Complete list of metadatas

Cited literature [14 references]  Display  Hide  Download

https://hal.inria.fr/hal-00835352
Contributor : Adrien Couetoux <>
Submitted on : Tuesday, June 18, 2013 - 3:26:31 PM
Last modification on : Thursday, April 5, 2018 - 12:30:12 PM
Long-term archiving on : Thursday, September 19, 2013 - 4:11:19 AM

File

doublePWConf.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00835352, version 1

Collections

Citation

David Auger, Adrien Couetoux, Olivier Teytaud. Continuous Upper Confidence Trees with Polynomial Exploration - Consistency. ECML/PKKD 2013, Sep 2013, Prague, Czech Republic. pp.194-209. ⟨hal-00835352⟩

Share

Metrics

Record views

566

Files downloads

1018