Learning to Act in Decentralized Partially Observable MDPs

Jilles Dibangoye; Olivier Buffet

Communication Dans Un Congrès Année : 2018

Learning to Act in Decentralized Partially Observable MDPs

(1) , (2)

1
2

Jilles Dibangoye

Fonction : Auteur
PersonId : 4917
IdHAL : jilles-steeve-dibangoye
ORCID : 0000-0001-8826-4438
IdRef : 144368145

Robots coopératifs et adaptés à la présence humaine en environnements dynamiques

Olivier Buffet

Fonction : Auteur
PersonId : 1407
IdHAL : olivier-buffet
ORCID : 0000-0002-5072-5857

Lifelong Autonomy and interaction skills for Robots in a Sensing ENvironment

Résumé

We address a long-standing open problem of reinforcement learning in decentralized partially observable Markov decision processes. Previous attempts focussed on different forms of generalized policy iteration, which at best led to local optima. In this paper, we restrict attention to plans, which are simpler to store and update than policies. We derive, under certain conditions, the first near-optimal cooperative multi-agent reinforcement learning algorithm. To achieve significant scalability gains, we replace the greedy maximization by mixed-integer linear programming. Experiments show our approach can learn to act near-optimally in many finite domains from the literature.

Domaines

Intelligence artificielle [cs.AI] Combinatoire [math.CO] Optimisation et contrôle [math.OC]

Fichier principal

paper.pdf (364.92 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Jilles Steeve Dibangoye : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01851806

Soumis le : lundi 30 juillet 2018-23:51:21

Dernière modification le : lundi 22 avril 2024-13:59:11

Archivage à long terme le : mercredi 31 octobre 2018-14:02:29

Dates et versions

hal-01851806 , version 1 (30-07-2018)

Identifiants

HAL Id : hal-01851806 , version 1

Citer

Jilles Dibangoye, Olivier Buffet. Learning to Act in Decentralized Partially Observable MDPs. ICML 2018 - 35th International Conference on Machine Learning, Jul 2018, Stockholm, Sweden. pp.1233-1242. ⟨hal-01851806⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA INSA-LYON UNIV-LORRAINE INRIA2 TDS-MACS LORIA LORIA-AIS LABEXIMU CITI INSA-GROUPE UDL

271 Consultations

132 Téléchargements

Learning to Act in Decentralized Partially Observable MDPs

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager