A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2011

A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences

Résumé

We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of \cite{Burnetas96}. Our contribution is to provide a finite-time analysis of this algorithm; we get bounds whose main terms are smaller than the ones of previously known algorithms with finite-time analyses (like UCB-type algorithms).
Fichier principal
Vignette du fichier
66-Maillard-Munos-Stoltz.pdf (283.93 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

inria-00574987 , version 1 (09-03-2011)
inria-00574987 , version 2 (27-05-2011)

Identifiants

  • HAL Id : inria-00574987 , version 2
  • ARXIV : 1105.5820

Citer

Odalric-Ambrym Maillard, Rémi Munos, Gilles Stoltz. A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences. 24th Annual Conference on Learning Theory : COLT'11, Jul 2011, Budapest, Hungary. pp.18. ⟨inria-00574987v2⟩
396 Consultations
242 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More