Learning to Act in Decentralized Partially Observable MDPs - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Rapport (Rapport De Recherche) Année : 2018

Learning to Act in Decentralized Partially Observable MDPs

Apprendre à agir dans un Dec-POMDP

Résumé

We address a long-standing open problem of reinforcement learning in decentralized partially observable Markov decision processes. Previous attempts focussed on different forms of generalized policy iteration, which at best led to local optima. In this paper, we restrict attention to plans, which are simpler to store and update than policies. We derive, under certain conditions, the first near-optimal cooperative multi-agent reinforcement learning algorithm. To achieve significant scalability gains, we replace the greedy maximization by mixed-integer linear programming. Experiments show our approach can learn to act near-optimally in many finite domains from the literature.
Fichier principal
Vignette du fichier
RR-9179.pdf (717.87 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01809897 , version 1 (07-06-2018)

Identifiants

  • HAL Id : hal-01809897 , version 1

Citer

Jilles Dibangoye, Olivier Buffet. Learning to Act in Decentralized Partially Observable MDPs. [Research Report] RR-9179, INRIA Grenoble - Rhone-Alpes - CHROMA Team; INRIA Nancy, équipe LARSEN. 2018. ⟨hal-01809897⟩
301 Consultations
521 Téléchargements

Partager

Gmail Facebook X LinkedIn More