Learning to Act in Decentralized Partially Observable MDPs

Jilles Dibangoye; Olivier Buffet

Rapport (Rapport De Recherche) Année : 2018

Learning to Act in Decentralized Partially Observable MDPs

Apprendre à agir dans un Dec-POMDP

(1, 2, 3) , (4, 5)

1
2
3
4
5

Jilles Dibangoye

Fonction : Auteur
PersonId : 4917
IdHAL : jilles-steeve-dibangoye
ORCID : 0000-0001-8826-4438
IdRef : 144368145

Robots coopératifs et adaptés à la présence humaine en environnements dynamiques

CITI Centre of Innovation in Telecommunications and Integration of services

Université de Lyon

Olivier Buffet

Fonction : Auteur
PersonId : 1407
IdHAL : olivier-buffet
ORCID : 0000-0002-5072-5857

Laboratoire Lorrain de Recherche en Informatique et ses Applications

Lifelong Autonomy and interaction skills for Robots in a Sensing ENvironment

Résumé

We address a long-standing open problem of reinforcement learning in decentralized partially observable Markov decision processes. Previous attempts focussed on different forms of generalized policy iteration, which at best led to local optima. In this paper, we restrict attention to plans, which are simpler to store and update than policies. We derive, under certain conditions, the first near-optimal cooperative multi-agent reinforcement learning algorithm. To achieve significant scalability gains, we replace the greedy maximization by mixed-integer linear programming. Experiments show our approach can learn to act near-optimally in many finite domains from the literature.

Mots clés

Multi-Agent Reinforcement Learning Decentralized Partially Observable Stochastic Control

Domaines

Intelligence artificielle [cs.AI] Combinatoire [math.CO] Optimisation et contrôle [math.OC]

Fichier principal

RR-9179.pdf (717.87 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Jilles Steeve Dibangoye : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01809897

Soumis le : jeudi 7 juin 2018-11:55:32

Dernière modification le : lundi 8 avril 2024-11:16:22

Archivage à long terme le : samedi 8 septembre 2018-13:50:50

Dates et versions

hal-01809897 , version 1 (07-06-2018)

Identifiants

HAL Id : hal-01809897 , version 1

Citer

Jilles Dibangoye, Olivier Buffet. Learning to Act in Decentralized Partially Observable MDPs. [Research Report] RR-9179, INRIA Grenoble - Rhone-Alpes - CHROMA Team; INRIA Nancy, équipe LARSEN. 2018. ⟨hal-01809897⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA INSA-LYON INRIA-RRRT UNIV-LORRAINE INRIA2 TDS-MACS LORIA LORIA-AIS LARA LABEXIMU CITI INSA-GROUPE UDL

301 Consultations

521 Téléchargements

Learning to Act in Decentralized Partially Observable MDPs

Apprendre à agir dans un Dec-POMDP

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager