On the Study of Cooperative Multi-Agent Policy Gradient

Guillaume Bono 1, 2 Jilles Dibangoye 1 Laëtitia Matignon 3, 1 Florian Pereyron 4 Olivier Simonin 5
1 CHROMA - Robots coopératifs et adaptés à la présence humaine en environnements dynamiques
Inria Grenoble - Rhône-Alpes, CITI - CITI Centre of Innovation in Telecommunications and Integration of services
3 SMA - Systèmes Multi-Agents
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
5 MAIA - Autonomous intelligent machine
Inria Nancy - Grand Est, LORIA - AIS - Department of Complex Systems, Artificial Intelligence & Robotics
Abstract : Reinforcement Learning (RL) for decentralized partially observable Markov decision processes (Dec-POMDPs) is lagging behind the spectacular breakthroughs of single-agent RL. That is because assumptions that hold in single-agent settings are often obsolete in decentralized multi-agent systems. To tackle this issue, we investigate the foundations of policy gradient methods within the centralized training for decentralized control (CTDC) paradigm. In this paradigm, learning can be accomplished in a centralized manner while execution can still be independent. Using this insight, we establish policy gradient theorem and compatible function approximations for decentralized multi-agent systems. Resulting actor-critic methods preserve the decentralized control at the execution phase, but can also estimate the policy gradient from collective experiences guided by a centralized critic at the training phase. Experiments demonstrate our policy gradient methods compare favorably against standard RL techniques in benchmarks from the literature.
Type de document :
Rapport
[Research Report] RR-9188, INSA Lyon; INRIA. 2018
Liste complète des métadonnées

Littérature citée [8 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01821677
Contributeur : Guillaume Bono <>
Soumis le : mardi 17 juillet 2018 - 13:48:54
Dernière modification le : mercredi 18 juillet 2018 - 01:16:50

Fichier

RR-9188.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01821677, version 2

Citation

Guillaume Bono, Jilles Dibangoye, Laëtitia Matignon, Florian Pereyron, Olivier Simonin. On the Study of Cooperative Multi-Agent Policy Gradient. [Research Report] RR-9188, INSA Lyon; INRIA. 2018. 〈hal-01821677v2〉

Partager

Métriques

Consultations de la notice

2