Structure Adaptive Algorithms for Stochastic Bandits

Rémy Degenne; Han Shao; Wouter M. Koolen

Communication Dans Un Congrès Année : 2020

Structure Adaptive Algorithms for Stochastic Bandits

(1) , (2) , (3)

1
2
3

Rémy Degenne

Fonction : Auteur

Statistical Machine Learning and Parsimony

Han Shao

Fonction : Auteur

Toyota Technological Institute

Wouter M. Koolen

Fonction : Auteur

Centrum voor Wiskunde en Informatica

Résumé

We study reward maximisation in a wide class of structured stochastic multi-armed bandit problems, where the mean rewards of arms satisfy some given structural constraints, e.g. linear, unimodal, sparse, etc. Our aim is to develop methods that are flexible (in that they easily adapt to different structures), powerful (in that they perform well empirically and/or provably match instance-dependent lower bounds) and efficient in that the per-round computational burden is small. We develop asymptotically optimal algorithms from instance-dependent lower-bounds using iterative saddle-point solvers. Our approach generalises recent iterative methods for pure exploration to reward maximisation, where a major challenge arises from the estimation of the sub-optimality gaps and their reciprocals. Still we manage to achieve all the above desiderata. Notably, our technique avoids the computational cost of the full-blown saddle point oracle employed by previous work, while at the same time enabling finite-time regret bounds. Our experiments reveal that our method successfully leverages the structural assumptions, while its regret is at worst comparable to that of vanilla UCB.

Domaines

Machine Learning [stat.ML]

Rémy Degenne : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03830727

Soumis le : mercredi 26 octobre 2022-15:04:08

Dernière modification le : vendredi 19 avril 2024-16:18:58

Dates et versions

hal-03830727 , version 1 (26-10-2022)

Identifiants

HAL Id : hal-03830727 , version 1
ARXIV : 2007.00969

Citer

Rémy Degenne, Han Shao, Wouter M. Koolen. Structure Adaptive Algorithms for Stochastic Bandits. ICML 2020 - Thirty-seventh International Conference on Machine Learning, Jul 2020, Online, United States. ⟨hal-03830727⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS CNRS INRIA INRIA2 PSL ANR PRAIRIE-IA

13 Consultations

0 Téléchargements

Structure Adaptive Algorithms for Stochastic Bandits

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager