HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Online Optimization in X-Armed Bandits

Sébastien Bubeck 1 Rémi Munos 1 Gilles Stoltz 2, 3 Csaba Szepesvari 4
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : We consider a generalization of stochastic bandit problems where the set of arms, X, is allowed to be a generic topological space. We constraint the mean-payoff function with a dissimilarity function over X in a way that is more general than Lipschitz. We construct an arm selection policy whose regret improves upon previous result for a large class of problems. In particular, our results imply that if X is the unit hypercube in a Euclidean space and the mean-payoff function has a finite number of global maxima around which the behavior of the function is locally Holder with a known exponent, then the expected regret is bounded up to a logarithmic factor by $\sqrt{n}$, i.e., the rate of the growth of the regret is independent of the dimension of the space. Moreover, we prove the minimax optimality of our algorithm for the class of mean-payoff functions we consider.
Complete list of metadata

Cited literature [8 references]  Display  Hide  Download

Contributor : Sébastien Bubeck Connect in order to contact the contributor
Submitted on : Monday, October 13, 2008 - 2:28:01 PM
Last modification on : Thursday, March 17, 2022 - 10:08:13 AM
Long-term archiving on: : Tuesday, October 9, 2012 - 12:02:21 PM


Files produced by the author(s)


  • HAL Id : inria-00329797, version 1


Sébastien Bubeck, Rémi Munos, Gilles Stoltz, Csaba Szepesvari. Online Optimization in X-Armed Bandits. Twenty-Second Annual Conference on Neural Information Processing Systems, Dec 2008, Vancouver, Canada. ⟨inria-00329797⟩



Record views


Files downloads