Online Optimization in X-Armed Bandits - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2008

Online Optimization in X-Armed Bandits

Résumé

We consider a generalization of stochastic bandit problems where the set of arms, X, is allowed to be a generic topological space. We constraint the mean-payoff function with a dissimilarity function over X in a way that is more general than Lipschitz. We construct an arm selection policy whose regret improves upon previous result for a large class of problems. In particular, our results imply that if X is the unit hypercube in a Euclidean space and the mean-payoff function has a finite number of global maxima around which the behavior of the function is locally Holder with a known exponent, then the expected regret is bounded up to a logarithmic factor by $\sqrt{n}$, i.e., the rate of the growth of the regret is independent of the dimension of the space. Moreover, we prove the minimax optimality of our algorithm for the class of mean-payoff functions we consider.
Fichier principal
Vignette du fichier
HOO_non-anonymous.pdf (156.57 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

inria-00329797 , version 1 (13-10-2008)

Identifiants

  • HAL Id : inria-00329797 , version 1

Citer

Sébastien Bubeck, Rémi Munos, Gilles Stoltz, Csaba Szepesvari. Online Optimization in X-Armed Bandits. Twenty-Second Annual Conference on Neural Information Processing Systems, Dec 2008, Vancouver, Canada. ⟨inria-00329797⟩
1651 Consultations
788 Téléchargements

Partager

Gmail Facebook X LinkedIn More