Sur le Gradient de la Politique pour les Systèmes Multi-Agents Coopératifs

Guillaume Bono 1 Jilles Dibangoye 1 Laëtitia Matignon 1, 2 Florian Pereyron 3 Olivier Simonin 1
1 CHROMA - Robots coopératifs et adaptés à la présence humaine en environnements dynamiques
CITI - CITI Centre of Innovation in Telecommunications and Integration of services, Inria Grenoble - Rhône-Alpes
2 SMA - Systèmes Multi-Agents
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : Reinforcement Learning (RL) for decentralized partially observable Markov decision processes (Dec-POMDPs) is lagging behind the spectacular breakthroughs of single-agent RL. That is because assumptions that hold in single-agent settings are often obsolete in decentralized multi-agent systems. To tackle this issue, we investigate the foundations of policy gradient methods within the centralized training for decentralized control (CTDC) paradigm. In this paradigm, learning can be accomplished in a centralized manner while each agent can still execute its policy independently at deployment. Using this insight, we establish a new policy gradient theorem and compatible function approximations for decentralized multi-agent systems. Resulting actor critic methods preserve the decentralized control at the execution phase, but can also estimate the policy gradient from collective experiences guided by a centralized critic at the training phase. Experiments demonstrate our policy gradient methods compare favorably against standard RL techniques in benchmarks from the literature.
Document type :
Conference papers
Complete list of metadatas

Cited literature [32 references]  Display  Hide  Download

https://hal.inria.fr/hal-01840852
Contributor : Olivier Buffet <>
Submitted on : Monday, July 16, 2018 - 5:24:34 PM
Last modification on : Thursday, February 7, 2019 - 4:52:22 PM
Long-term archiving on : Wednesday, October 17, 2018 - 4:53:17 PM

File

JFPDA_2018_paper_11.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01840852, version 1

Citation

Guillaume Bono, Jilles Dibangoye, Laëtitia Matignon, Florian Pereyron, Olivier Simonin. Sur le Gradient de la Politique pour les Systèmes Multi-Agents Coopératifs. JFPDA 2018 - Journées Francophones sur la Planification, la Décision et l'Apprentissage pour la conduite de systèmes, Jul 2018, Nancy, France. pp.1-13. ⟨hal-01840852⟩

Share

Metrics

Record views

258

Files downloads

125