A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences

Odalric-Ambrym Maillard; Rémi Munos; Gilles Stoltz

Communication Dans Un Congrès Année : 2011

A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences

(1) , (1) , (2, 3, 4)

1
2
3
4

Odalric-Ambrym Maillard

Fonction : Auteur
PersonId : 5563
IdHAL : odalric-ambrym-maillard
ORCID : 0000-0001-7935-7026
IdRef : 158055594

Sequential Learning

Rémi Munos

Fonction : Auteur

Sequential Learning

Gilles Stoltz

Fonction : Auteur
PersonId : 738739
IdHAL : gilles-stoltz
ORCID : 0000-0003-1240-1007
IdRef : 091575419

Département de Mathématiques et Applications - ENS Paris

Groupement de Recherche et d'Etudes en Gestion à HEC

Computational Learning, Aggregation, Supervised Statistical, Inference, and Classification

Résumé

We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of \cite{Burnetas96}. Our contribution is to provide a finite-time analysis of this algorithm; we get bounds whose main terms are smaller than the ones of previously known algorithms with finite-time analyses (like UCB-type algorithms).

Domaines

Statistiques [math.ST] Théorie [stat.TH] Apprentissage [cs.LG]

Fichier principal

66-Maillard-Munos-Stoltz.pdf (283.93 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Gilles Stoltz : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00574987

Soumis le : vendredi 27 mai 2011-22:44:43

Dernière modification le : samedi 27 avril 2024-03:14:13

Archivage à long terme le : dimanche 4 décembre 2016-12:53:05

Dates et versions

inria-00574987 , version 1 (09-03-2011)

inria-00574987 , version 2 (27-05-2011)

Identifiants

HAL Id : inria-00574987 , version 2
ARXIV : 1105.5820

Citer

Odalric-Ambrym Maillard, Rémi Munos, Gilles Stoltz. A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences. 24th Annual Conference on Learning Theory : COLT'11, Jul 2011, Budapest, Hungary. pp.18. ⟨inria-00574987v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS HEC UNIV-LILLE3 CNRS INRIA INSMI LAGIS INRIA2 PSL MATH_ENS_PARIS

396 Consultations

242 Téléchargements

A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager