On the Study of Cooperative Multi-Agent Policy Gradient

Reinforcement Learning (RL) for decentralized partially observable Markov decision processes (Dec-POMDPs) is lagging behind the spectacular breakthroughs of single-agent RL. That is because assumptions that hold in single-agent settings are often obsolete in decentralized multi-agent systems. To tackle this issue, we investigate the foundations of policy gradient methods within the centralized training for decentralized control (CTDC) paradigm. In this paradigm, learning can be accomplished in a centralized manner while execution can still be independent. Using this insight, we establish policy gradient theorem and compatible function approximations for decentralized multi-agent systems. Resulting actor-critic methods preserve the decentralized control at the execution phase, but can also estimate the policy gradient from collective experiences guided by a centralized critic at the training phase. Experiments demonstrate our policy gradient methods compare favorably against standard RL techniques in benchmarks from the literature.

Mots clés

Partial Observable Markov Decision Processes Decentralized Control Actor Critic Multi-Agent Systems

Domaines

Intelligence artificielle [cs.AI] Machine Learning [stat.ML]

Fichier principal

RR-9188.pdf (1.79 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Guillaume Bono : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01821677

Soumis le : mardi 17 juillet 2018-13:48:54

Dernière modification le : mercredi 27 mars 2024-09:28:03

Archivage à long terme le : jeudi 18 octobre 2018-14:23:55

Dates et versions

hal-01821677 , version 1 (22-06-2018)

hal-01821677 , version 2 (17-07-2018)

Identifiants

HAL Id : hal-01821677 , version 2

Citer

Guillaume Bono, Jilles Steeve Dibangoye, Laëtitia Matignon, Florian Pereyron, Olivier Simonin. On the Study of Cooperative Multi-Agent Policy Gradient. [Research Report] RR-9188, INSA Lyon; INRIA. 2018, pp.1-27. ⟨hal-01821677v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA UNIV-LYON1 UNIV-LYON2 INSA-LYON EC-LYON IRISA INRIA-RRRT LIRIS INRIA2 LARA UR1-MATH-STIC UR1-UFR-ISTIC LABEXIMU UNIV-RENNES CITI INSA-GROUPE CHAIREVOLVO UDL UR1-MATH-NUM

627 Consultations

969 Téléchargements