Online Optimization in X-Armed Bandits

1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
Abstract : We consider a generalization of stochastic bandit problems where the set of arms, X, is allowed to be a generic topological space. We constraint the mean-payoff function with a dissimilarity function over X in a way that is more general than Lipschitz. We construct an arm selection policy whose regret improves upon previous result for a large class of problems. In particular, our results imply that if X is the unit hypercube in a Euclidean space and the mean-payoff function has a finite number of global maxima around which the behavior of the function is locally Holder with a known exponent, then the expected regret is bounded up to a logarithmic factor by $\sqrt{n}$, i.e., the rate of the growth of the regret is independent of the dimension of the space. Moreover, we prove the minimax optimality of our algorithm for the class of mean-payoff functions we consider.
Type de document :
Communication dans un congrès
Twenty-Second Annual Conference on Neural Information Processing Systems, Dec 2008, Vancouver, Canada. 2008
Domaine :

Littérature citée [8 références]

https://hal.inria.fr/inria-00329797
Contributeur : Sébastien Bubeck <>
Soumis le : lundi 13 octobre 2008 - 14:28:01
Dernière modification le : mardi 24 avril 2018 - 17:20:06
Document(s) archivé(s) le : mardi 9 octobre 2012 - 12:02:21

Fichier

HOO_non-anonymous.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

• HAL Id : inria-00329797, version 1

Citation

Sébastien Bubeck, Rémi Munos, Gilles Stoltz, Csaba Szepesvari. Online Optimization in X-Armed Bandits. Twenty-Second Annual Conference on Neural Information Processing Systems, Dec 2008, Vancouver, Canada. 2008. 〈inria-00329797〉

Métriques

Consultations de la notice

615

Téléchargements de fichiers