HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation

Adaptive Bandits: Towards the best history-dependent strategy

Odalric-Ambrym Maillard 1 Rémi Munos 1
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : We consider multi-armed bandit games with possibly adaptive opponents. We introduce models Theta of constraints based on equivalence classes on the common history (information shared by the player and the opponent) which dene two learning scenarios: (1) The opponent is constrained, i.e. he provides rewards that are stochastic functions of equivalence classes dened by some model theta*\in Theta. The regret is measured with respect to (w.r.t.) the best history-dependent strategy. (2) The opponent is arbitrary and we measure the regret w.r.t. the best strategy among all mappings from classes to actions (i.e. the best history-class-based strategy) for the best model in Theta. This allows to model opponents (case 1) or strategies (case 2) which handles nite memory, periodicity, standard stochastic bandits and other situations. When Theta={theta}, i.e. only one model is considered, we derive tractable algorithms achieving a tight regret (at time T) bounded by ~O(sqrt(TAC)), where C is the number of classes of theta. Now, when many models are available, all known algorithms achieving a nice regret O(sqrt(T)) are unfortunately not tractable and scale poorly with the number of models |Theta|. Our contribution here is to provide tractable algorithms with regret bounded by T^{2/3}C^{1/3} log(|Theta|)^{1/2}.
Complete list of metadata

Cited literature [18 references]  Display  Hide  Download

Contributor : Odalric-Ambrym Maillard Connect in order to contact the contributor
Submitted on : Wednesday, March 9, 2011 - 1:57:08 PM
Last modification on : Thursday, January 20, 2022 - 4:16:32 PM
Long-term archiving on: : Tuesday, November 6, 2012 - 3:56:07 PM


Files produced by the author(s)


  • HAL Id : inria-00574999, version 1



Odalric-Ambrym Maillard, Rémi Munos. Adaptive Bandits: Towards the best history-dependent strategy. [Technical Report] 2011, pp.14. ⟨inria-00574999⟩



Record views


Files downloads