Rotting bandits are not harder than stochastic ones

Julien Seznec; Andrea Locatelli; Alexandra Carpentier; Alessandro Lazaric; Michal Valko

Communication Dans Un Congrès Année : 2019

Rotting bandits are not harder than stochastic ones

(1, 2) , (3) , (3) , (1, 4) , (1, 5)

1
2
3
4
5

Julien Seznec

Fonction : Auteur

Sequential Learning

Lelivrescolaire.fr

Andrea Locatelli

Fonction : Auteur

Otto-von-Guericke-Universität Magdeburg = Otto-von-Guericke University [Magdeburg]

Alexandra Carpentier

Fonction : Auteur
PersonId : 910455

Otto-von-Guericke-Universität Magdeburg = Otto-von-Guericke University [Magdeburg]

Alessandro Lazaric

Fonction : Auteur
PersonId : 851
IdHAL : alessandro-lazaric
ORCID : 0000-0002-8970-413X
IdRef : 188701486

Sequential Learning

Facebook AI Research [Paris]

Michal Valko

Fonction : Auteur
PersonId : 284
IdHAL : michal
IdRef : 22360934X

Sequential Learning

DeepMind [Paris]

Résumé

In stochastic multi-armed bandits, the reward distribution of each arm is assumed to be stationary. This assumption is often violated in practice (e.g., in recommendation systems), where the reward of an arm may change whenever is selected, i.e., rested bandit setting. In this paper, we consider the non-parametric rotting bandit setting, where rewards can only decrease. We introduce the filtering on expanding window average (FEWA) algorithm that constructs moving averages of increasing windows to identify arms that are more likely to return high rewards when pulled once more. We prove that for an unknown horizon $T$, and without any knowledge on the decreasing behavior of the $K$ arms, FEWA achieves problem-dependent regret bound of $\widetilde{\mathcal{O}}(\log{(KT)}),$ and a problem-independent one of $\widetilde{\mathcal{O}}(\sqrt{KT})$. Our result substantially improves over the algorithm of Levine et al. (2017), which suffers regret $\widetilde{\mathcal{O}}(K^{1/3}T^{2/3})$. FEWA also matches known bounds for the stochastic bandit setting, thus showing that the rotting bandits are not harder. Finally, we report simulations confirming the theoretical improvements of FEWA.

Domaines

Machine Learning [stat.ML]

Fichier principal

seznec2019rotting.pdf (7.26 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Michal Valko : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01936894

Soumis le : samedi 9 mai 2020-20:48:20

Dernière modification le : mercredi 24 janvier 2024-09:54:23

Dates et versions

hal-01936894 , version 1 (27-11-2018)

hal-01936894 , version 2 (09-05-2020)

Identifiants

HAL Id : hal-01936894 , version 2

Citer

Julien Seznec, Andrea Locatelli, Alexandra Carpentier, Alessandro Lazaric, Michal Valko. Rotting bandits are not harder than stochastic ones. International Conference on Artificial Intelligence and Statistics, 2019, Naha, Japan. ⟨hal-01936894v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA GRID5000 CRISTAL INRIA2 CRISTAL-SEQUEL UNIV-LILLE SILECS ANR

219 Consultations

214 Téléchargements

Rotting bandits are not harder than stochastic ones

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager