Continuous Upper Confidence Trees with Polynomial Exploration - Consistency - Archive ouverte HAL Access content directly
Conference Papers Year : 2013

Continuous Upper Confidence Trees with Polynomial Exploration - Consistency

Abstract

Upper Confidence Trees (UCT) are now a well known algorithm for sequential decision making; it is a provably consistent variant of Monte-Carlo Tree Search. However, the consistency is only proved in a the case where both the action space is finite. We here propose a proof in the case of fully observable Markov Decision Processes with bounded horizon, possibly including infinitely many states and infinite action spaces and arbitrary stochastic transition kernels. We illustrate the consistency on two benchmark problems, one being a legacy toy problem, the other a more challenging one, the famous energy unit commitment problem.
Fichier principal
Vignette du fichier
doublePWConf.pdf (197.26 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-00835352 , version 1 (18-06-2013)

Identifiers

  • HAL Id : hal-00835352 , version 1

Cite

David Auger, Adrien Couetoux, Olivier Teytaud. Continuous Upper Confidence Trees with Polynomial Exploration - Consistency. ECML/PKKD 2013, Sep 2013, Prague, Czech Republic. pp.194-209. ⟨hal-00835352⟩
278 View
1724 Download

Share

Gmail Facebook Twitter LinkedIn More