Nonstochastic Bandits with Composite Anonymous Feedback

Nicolo Cesa-Bianchi; Claudio Gentile; Yishay Mansour

Communication Dans Un Congrès Année : 2018

Nonstochastic Bandits with Composite Anonymous Feedback

(1) , (2, 3) , (4, 3)

1
2
3
4

Nicolo Cesa-Bianchi

Fonction : Auteur

Università degli Studi di Milano = University of Milan

Claudio Gentile

Fonction : Auteur
PersonId : 1038675

Machine Learning in Information Networks

Google Inc

Yishay Mansour

Fonction : Auteur

Tel Aviv University

Google Inc

Résumé

We investigate a nonstochastic bandit setting in which the loss of an action is not immediately charged to the player, but rather spread over at most d consecutive steps in an adversarial way. This implies that the instantaneous loss observed by the player at the end of each round is a sum of as many as d loss components of previously played actions. Hence, unlike the standard bandit setting with delayed feedback, here the player cannot observe the individual delayed losses, but only their sum. Our main contribution is a general reduction transforming a standard bandit algorithm into one that can operate in this harder setting. We also show how the regret of the transformed algorithm can be bounded in terms of the regret of the original algorithm. Our reduction cannot be improved in general: we prove a lower bound on the regret of any bandit algorithm in this setting that matches (up to log factors) the upper bound obtained via our reduction. Finally, we show how our reduction can be extended to more complex bandit settings, such as combinatorial linear bandits and online bandit convex optimization.

Mots clés

Nonstochastic bandits composite losses delayed feedback bandit convex optimization

Domaines

Informatique [cs] Apprentissage [cs.LG]

Fichier principal

colt2018.pdf (404.69 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Claudio Gentile : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01916981

Soumis le : vendredi 9 novembre 2018-03:48:52

Dernière modification le : mercredi 24 janvier 2024-09:54:24

Archivage à long terme le : dimanche 10 février 2019-12:21:10

Dates et versions

hal-01916981 , version 1 (09-11-2018)

Identifiants

HAL Id : hal-01916981 , version 1

Citer

Nicolo Cesa-Bianchi, Claudio Gentile, Yishay Mansour. Nonstochastic Bandits with Composite Anonymous Feedback. COLT 2018 - 31st Annual Conference on Learning Theory, Jul 2018, Stockholm, Sweden. pp.1 - 23. ⟨hal-01916981⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA CRISTAL INRIA2 CRISTAL-MAGNET UNIV-LILLE

73 Consultations

130 Téléchargements

Nonstochastic Bandits with Composite Anonymous Feedback

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager