A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences

Odalric-Ambrym Maillard 1 Rémi Munos 1 Gilles Stoltz 2, 3, 4
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
4 CLASSIC - Computational Learning, Aggregation, Supervised Statistical, Inference, and Classification
DMA - Département de Mathématiques et Applications - ENS Paris, ENS Paris - École normale supérieure - Paris, Inria Paris-Rocquencourt
Abstract : We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of \cite{Burnetas96}. Our contribution is to provide a finite-time analysis of this algorithm; we get bounds whose main terms are smaller than the ones of previously known algorithms with finite-time analyses (like UCB-type algorithms).
Type de document :
Communication dans un congrès
Sham Kakade & Ulrike von Luxburg. 24th Annual Conference on Learning Theory : COLT'11, Jul 2011, Budapest, Hungary. pp.18, 2011
Liste complète des métadonnées

https://hal.inria.fr/inria-00574987
Contributeur : Gilles Stoltz <>
Soumis le : vendredi 27 mai 2011 - 22:44:43
Dernière modification le : vendredi 25 mai 2018 - 12:02:06
Document(s) archivé(s) le : dimanche 4 décembre 2016 - 12:53:05

Fichiers

66-Maillard-Munos-Stoltz.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00574987, version 2
  • ARXIV : 1105.5820

Citation

Odalric-Ambrym Maillard, Rémi Munos, Gilles Stoltz. A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences. Sham Kakade & Ulrike von Luxburg. 24th Annual Conference on Learning Theory : COLT'11, Jul 2011, Budapest, Hungary. pp.18, 2011. 〈inria-00574987v2〉

Partager

Métriques

Consultations de la notice

639

Téléchargements de fichiers

235