Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

A minimax and asymptotically optimal algorithm for stochastic bandits

Abstract : We propose the kl-UCB ++ algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the sense of Lai and Robbins' lower bound) and minimax optimal. This is the first algorithm proved to enjoy these two properties at the same time. This work thus merges two different lines of research, with simple proofs involving no complexity overhead.
Complete list of metadatas

Cited literature [19 references]  Display  Hide  Download
Contributor : Pierre Ménard <>
Submitted on : Thursday, February 23, 2017 - 2:32:33 PM
Last modification on : Thursday, March 5, 2020 - 3:08:03 PM
Long-term archiving on: : Wednesday, May 24, 2017 - 2:01:58 PM


Files produced by the author(s)


  • HAL Id : hal-01475078, version 1
  • ARXIV : 1702.07211


Pierre Ménard, Aurélien Garivier. A minimax and asymptotically optimal algorithm for stochastic bandits. 2017. ⟨hal-01475078v1⟩



Record views


Files downloads