sign in
english version rss feed

inria-00173263, version 1

Anytime many-armed bandits

Olivier Teytaud () 1, Sylvain Gelly 1, Michèle Sebag () 1

CAP07 (2007)

Abstract: This paper introduces the many-armed bandit problem (ManAB), where the number of arms is large comparatively to the relevant number of time steps. While the ManAB framework is relevant to many real-world applications, the state of the art does not offer anytime algorithms handling ManAB problems. Both theory and practice suggest that two problem categories must be distinguished; the easy category includes those problems where good arms have reward probability close to 1; the difficult category includes other problems. Two algorithms termed FAILURE and MUCBT are proposed for the ManAB framework. FAILURE and its variants extend the non-anytime approach proposed for the denumerable-armed bandit and non-asymptotic bounds are shown; it works very efficiently for easy ManAB problems. Meanwhile, MUCBT efficiently deals with difficult ManAB problems.

  • 1:  TAO (INRIA Futurs)
  • INRIA – CNRS : UMR8623 – Université Paris XI - Paris Sud
  • Domain : Computer Science/Computer Science and Game Theory
 
  • inria-00173263, version 1
  • oai:hal.inria.fr:inria-00173263
  • From: 
  • Submitted on: Wednesday, 19 September 2007 14:20:51
  • Updated on: Wednesday, 19 September 2007 14:21:26
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...