21732 articles – 15570 references  [version française]

inria-00574987, version 2

A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences

Odalric-Ambrym Maillard () 1, Rémi Munos 1, Gilles Stoltz 234

24th Annual Conference on Learning Theory : COLT'11 (2011) 18

Abstract: We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of \cite{Burnetas96}. Our contribution is to provide a finite-time analysis of this algorithm; we get bounds whose main terms are smaller than the ones of previously known algorithms with finite-time analyses (like UCB-type algorithms).

  • 1:  SEQUEL (INRIA Lille - Nord Europe)
  • INRIA – CNRS : UMR8146 – Université Lille I - Sciences et technologies – Université Lille III - Sciences humaines et sociales – Ecole Centrale de Lille
  • 2:  Département de Mathématiques et Applications (DMA)
  • CNRS : UMR8553 – Ecole normale supérieure de Paris - ENS Paris
  • 3:  Groupement de Recherche et d'Etudes en Gestion à HEC (GREGH)
  • GROUPE HEC – CNRS : UMR2959
  • 4:  CLASSIC (INRIA Paris - Rocquencourt)
  • Ecole normale supérieure de Paris - ENS Paris – INRIA
 
  • inria-00574987, version 2
  • oai:hal.inria.fr:inria-00574987
  • From: 
  • Submitted on: Friday, 27 May 2011 22:44:43
  • Updated on: Tuesday, 14 June 2011 11:34:34