A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences - Archive ouverte HAL Access content directly
Conference Papers Year : 2011

A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences

(1) , (1) , (2, 3, 4)
1
2
3
4

Abstract

We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of \cite{Burnetas96}. Our contribution is to provide a finite-time analysis of this algorithm; we get bounds whose main terms are smaller than the ones of previously known algorithms with finite-time analyses (like UCB-type algorithms).
Fichier principal
Vignette du fichier
66-Maillard-Munos-Stoltz.pdf (283.93 Ko) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

inria-00574987 , version 1 (09-03-2011)
inria-00574987 , version 2 (27-05-2011)

Identifiers

  • HAL Id : inria-00574987 , version 2
  • ARXIV : 1105.5820

Cite

Odalric-Ambrym Maillard, Rémi Munos, Gilles Stoltz. A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences. 24th Annual Conference on Learning Theory : COLT'11, Jul 2011, Budapest, Hungary. pp.18. ⟨inria-00574987v2⟩
379 View
225 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More