inria-00173263, version 1
Anytime many-armed bandits
Olivier Teytaud
1Sylvain Gelly 1Michèle Sebag
1
CAP07 (2007)
Abstract: This paper introduces the many-armed bandit problem (ManAB), where the number of arms is large comparatively to the relevant number of time steps. While the ManAB framework is relevant to many real-world applications, the state of the art does not offer anytime algorithms handling ManAB problems. Both theory and practice suggest that two problem categories must be distinguished; the easy category includes those problems where good arms have reward probability close to 1; the difficult category includes other problems. Two algorithms termed FAILURE and MUCBT are proposed for the ManAB framework. FAILURE and its variants extend the non-anytime approach proposed for the denumerable-armed bandit and non-asymptotic bounds are shown; it works very efficiently for easy ManAB problems. Meanwhile, MUCBT efficiently deals with difficult ManAB problems.
- 1: TAO (INRIA Futurs)
- INRIA – CNRS : UMR8623 – Université Paris XI - Paris Sud
- Domain : Computer Science/Computer Science and Game Theory
- inria-00173263, version 1
- http://hal.inria.fr/inria-00173263
- oai:hal.inria.fr:inria-00173263
- From: Olivier Teytaud
- Submitted on: Wednesday, 19 September 2007 14:20:51
- Updated on: Wednesday, 19 September 2007 14:21:26






Associated documents
Export