A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences

Odalric-Ambrym Maillard 1 Rémi Munos 1 Gilles Stoltz 2, 3, 4
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
4 CLASSIC - Computational Learning, Aggregation, Supervised Statistical, Inference, and Classification
DMA - Département de Mathématiques et Applications - ENS Paris, ENS Paris - École normale supérieure - Paris, Inria Paris-Rocquencourt
Abstract : We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of \cite{Burnetas96}. Our contribution is to provide a finite-time analysis of this algorithm; we get bounds whose main terms are smaller than the ones of previously known algorithms with finite-time analyses (like UCB-type algorithms).
Complete list of metadatas

https://hal.inria.fr/inria-00574987
Contributor : Gilles Stoltz <>
Submitted on : Friday, May 27, 2011 - 10:44:43 PM
Last modification on : Tuesday, April 2, 2019 - 2:15:46 PM
Long-term archiving on : Sunday, December 4, 2016 - 12:53:05 PM

Files

66-Maillard-Munos-Stoltz.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00574987, version 2
  • ARXIV : 1105.5820

Citation

Odalric-Ambrym Maillard, Rémi Munos, Gilles Stoltz. A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences. 24th Annual Conference on Learning Theory : COLT'11, Jul 2011, Budapest, Hungary. pp.18. ⟨inria-00574987v2⟩

Share

Metrics

Record views

674

Files downloads

300