Abstract : This paper introduces the many-armed bandit problem (ManAB), where the number of arms is large comparatively to the relevant number of time steps. While the ManAB framework is relevant to many real-world applications, the state of the art does not offer anytime algorithms handling ManAB problems. Both theory and practice suggest that two problem categories must be distinguished; the easy category includes those problems where good arms have reward probability close to 1; the difficult category includes other problems. Two algorithms termed FAILURE and MUCBT are proposed for the ManAB framework. FAILURE and its variants extend the non-anytime approach proposed for the denumerable-armed bandit and non-asymptotic bounds are shown; it works very efficiently for easy ManAB problems. Meanwhile, MUCBT efficiently deals with difficult ManAB problems.
https://hal.inria.fr/inria-00173263 Contributor : Olivier TeytaudConnect in order to contact the contributor Submitted on : Wednesday, September 19, 2007 - 2:20:51 PM Last modification on : Friday, February 4, 2022 - 3:21:34 AM Long-term archiving on: : Thursday, April 8, 2010 - 7:46:10 PM