inria-00574987, version 2
A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences
24th Annual Conference on Learning Theory : COLT'11 (2011) 18
Abstract: We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of \cite{Burnetas96}. Our contribution is to provide a finite-time analysis of this algorithm; we get bounds whose main terms are smaller than the ones of previously known algorithms with finite-time analyses (like UCB-type algorithms).
- 1:
- INRIA – CNRS : UMR8146 – Université Lille I - Sciences et technologies – Université Lille III - Sciences humaines et sociales – Ecole Centrale de Lille
- 2:
- CNRS : UMR8553 – Ecole normale supérieure de Paris - ENS Paris
- 3:
- GROUPE HEC – CNRS : UMR2959
- 4:
- Ecole normale supérieure de Paris - ENS Paris – INRIA
- Domain : Mathematics/Statistics
Statistics/Statistics Theory
Computer Science/Learning - Available versions : v1 (2011-03-09) v2 (2011-05-29)
- inria-00574987, version 2
- http://hal.inria.fr/inria-00574987
- oai:hal.inria.fr:inria-00574987
- From:
- Submitted on: Friday, 27 May 2011 22:44:43
- Updated on: Tuesday, 14 June 2011 11:34:34



Associated documents

See also
Export