A single algorithm for both restless and rested rotting bandits

Julien Seznec; Pierre Menard; Alessandro Lazaric; Michal Valko

Communication Dans Un Congrès Année : 2020

A single algorithm for both restless and rested rotting bandits

(1, 2) , (2) , (3) , (4)

1
2
3
4

Julien Seznec

Fonction : Auteur

Lelivrescolaire.fr

Scool

Pierre Menard

Fonction : Auteur

Scool

Alessandro Lazaric

Fonction : Auteur

Facebook AI Research [Paris]

Michal Valko

Fonction : Auteur
PersonId : 284
IdHAL : michal
IdRef : 22360934X

DeepMind [Paris]

Résumé

In many application domains (e.g., recommender systems, intelligent tutoring systems), the rewards associated to the actions tend to decrease over time. This decay is either caused by the actions executed in the past (e.g., a user may get bored when songs of the same genre are recommended over and over) or by an external factor (e.g., content becomes outdated). These two situations can be modeled as specific instances of the rested and restless bandit settings, where arms are rotting (i.e., their value decrease over time). These problems were thought to be significantly different, since Levine et al. (2017) showed that state-of-the-art algorithms for restless bandit perform poorly in the rested rotting setting. In this paper, we introduce a novel algorithm, Rotting Adaptive Window UCB (RAW-UCB), that achieves near-optimal regret in both rotting rested and restless bandit, without any prior knowledge of the setting (rested or restless) and the type of non-stationarity (e.g., piece-wise constant, bounded variation). This is in striking contrast with previous negative results showing that no algorithm can achieve similar results as soon as rewards are allowed to increase. We confirm our theoretical findings on a number of synthetic and datasetbased experiments.

Domaines

Machine Learning [stat.ML]

Fichier principal

seznec2020single.pdf (28.34 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Michal Valko : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03287835

Soumis le : jeudi 15 juillet 2021-21:55:52

Dernière modification le : mercredi 24 janvier 2024-09:54:24

Archivage à long terme le : samedi 16 octobre 2021-19:17:36

Dates et versions

hal-03287835 , version 1 (15-07-2021)

Identifiants

HAL Id : hal-03287835 , version 1

Citer

Julien Seznec, Pierre Menard, Alessandro Lazaric, Michal Valko. A single algorithm for both restless and rested rotting bandits. International Conference on Artificial Intelligence and Statistics, Aug 2020, Palermo / Virtual, Italy. ⟨hal-03287835⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA CRISTAL INRIA2 UNIV-LILLE CRISTAL-SCOOL

50 Consultations

25 Téléchargements

A single algorithm for both restless and rested rotting bandits

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager