Gradient-free Online Learning in Games with Delayed Rewards

Amélie Héliou; Panayotis Mertikopoulos; Zhengyuan Zhou

Communication Dans Un Congrès Année : 2020

Gradient-free Online Learning in Games with Delayed Rewards

(1) , (2, 1) , (3, 4)

1
2
3
4

Amélie Héliou

Fonction : Auteur
PersonId : 14758
IdHAL : amelie-heliou

Criteo AI Lab

Panayotis Mertikopoulos

Fonction : Auteur
PersonId : 1933
IdHAL : mertikop
ORCID : 0000-0003-2026-9616
IdRef : 253119758

Performance analysis and optimization of LARge Infrastructures and Systems

Criteo AI Lab

Zhengyuan Zhou

Fonction : Auteur

New York University [New York]

IBM Watson Research Center

Résumé

Motivated by applications to online advertising and recommender systems, we consider a gametheoretic model with delayed rewards and asynchronous, payoff-based feedback. In contrast to previous work on delayed multi-armed bandits, we focus on multi-player games with continuous action spaces, and we examine the long-run behavior of strategic agents that follow a no-regret learning policy (but are otherwise oblivious to the game being played, the objectives of their opponents, etc.). To account for the lack of a consistent stream of information (for instance, rewards can arrive out of order, with an a priori unbounded delay, etc.), we introduce a gradient-free learning policy where payoff information is placed in a priority queue as it arrives. In this general context, we derive new bounds for the agents' regret; furthermore, under a standard diagonal concavity assumption, we show that the induced sequence of play converges to Nash equilibrium (NE) with probability 1, even if the delay between choosing an action and receiving the corresponding reward is unbounded.

Domaines

Optimisation et contrôle [math.OC]

Fichier principal

ICML-2020-gradient-free-online-learning-in-continuous-games-with-delayed-rewards-Paper.pdf (332.46 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Panayotis Mertikopoulos : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03043703

Soumis le : lundi 7 décembre 2020-14:02:05

Dernière modification le : vendredi 5 avril 2024-03:09:47

Archivage à long terme le : lundi 8 mars 2021-19:05:37

Dates et versions

hal-03043703 , version 1 (07-12-2020)

Identifiants

HAL Id : hal-03043703 , version 1

Citer

Amélie Héliou, Panayotis Mertikopoulos, Zhengyuan Zhou. Gradient-free Online Learning in Games with Delayed Rewards. ICML 2020 - 37th International Conference on Machine Learning, Jul 2020, Vienna, Austria. pp.1-11. ⟨hal-03043703⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS INRIA LIG LIG_SRCPR INRIA2 TDS-MACS LIG-SRCPR-POLARIS MIAI ANR LIG_SIDCH

68 Consultations

248 Téléchargements

Gradient-free Online Learning in Games with Delayed Rewards

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager