A minimax and asymptotically optimal algorithm for stochastic bandits

Abstract : We propose the kl-UCB ++ algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the sense of Lai and Robbins' lower bound) and minimax optimal. This is the first algorithm proved to enjoy these two properties at the same time. This work thus merges two different lines of research with simple and clear proofs.
Complete list of metadatas

Cited literature [19 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01475078
Contributor : Pierre Ménard <>
Submitted on : Tuesday, September 19, 2017 - 5:39:41 PM
Last modification on : Friday, October 25, 2019 - 2:04:46 AM

Files

main.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

  • HAL Id : hal-01475078, version 2
  • ARXIV : 1702.07211

Citation

Pierre Ménard, Aurélien Garivier. A minimax and asymptotically optimal algorithm for stochastic bandits. Algorithmic Learning Theory, Springer, 2017, 2017 Algorithmic Learning Theory Conference 76. ⟨hal-01475078v2⟩

Share

Metrics

Record views

266

Files downloads

163