Gradient-free Online Learning in Games with Delayed Rewards - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Gradient-free Online Learning in Games with Delayed Rewards

Résumé

Motivated by applications to online advertising and recommender systems, we consider a gametheoretic model with delayed rewards and asynchronous, payoff-based feedback. In contrast to previous work on delayed multi-armed bandits, we focus on multi-player games with continuous action spaces, and we examine the long-run behavior of strategic agents that follow a no-regret learning policy (but are otherwise oblivious to the game being played, the objectives of their opponents, etc.). To account for the lack of a consistent stream of information (for instance, rewards can arrive out of order, with an a priori unbounded delay, etc.), we introduce a gradient-free learning policy where payoff information is placed in a priority queue as it arrives. In this general context, we derive new bounds for the agents' regret; furthermore, under a standard diagonal concavity assumption, we show that the induced sequence of play converges to Nash equilibrium (NE) with probability 1, even if the delay between choosing an action and receiving the corresponding reward is unbounded.
Fichier principal
Vignette du fichier
ICML-2020-gradient-free-online-learning-in-continuous-games-with-delayed-rewards-Paper.pdf (332.46 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03043703 , version 1 (07-12-2020)

Identifiants

  • HAL Id : hal-03043703 , version 1

Citer

Amélie Héliou, Panayotis Mertikopoulos, Zhengyuan Zhou. Gradient-free Online Learning in Games with Delayed Rewards. ICML 2020 - 37th International Conference on Machine Learning, Jul 2020, Vienna, Austria. pp.1-11. ⟨hal-03043703⟩
68 Consultations
248 Téléchargements

Partager

Gmail Facebook X LinkedIn More