Learning to Act in Continuous Dec-POMDPs

Abstract : We address a long-standing open problem of reinforcement learning in continuous decentralized partially observable Markov decision processes. Previous attempts focused on different forms of generalized policy iteration, which at best led to local optima. In this paper, we restrict attention to plans, which are simpler to store and update than policies. We derive, under mild conditions, the first optimal cooperative multi-agent reinforcement learning algorithm. To achieve significant scalability gains, we replace the greedy maximization by mixed-integer linear programming. Experiments show our approach can learn to act optimally in many finite domains from the literature.
Document type :
Conference papers
Complete list of metadatas

Cited literature [45 references]  Display  Hide  Download

https://hal.inria.fr/hal-01840602
Contributor : Olivier Buffet <>
Submitted on : Monday, July 16, 2018 - 3:08:15 PM
Last modification on : Tuesday, November 19, 2019 - 11:35:44 AM
Long-term archiving on: Wednesday, October 17, 2018 - 2:30:08 PM

File

JFPDA_2018_paper_5.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01840602, version 1

Citation

Jilles Dibangoye, Olivier Buffet. Learning to Act in Continuous Dec-POMDPs. JFPDA 2018 - Journées Francophones sur la Planification, la Décision et l'Apprentissage pour la conduite de systèmes, Jul 2018, Nancy, France. pp.1-10. ⟨hal-01840602⟩

Share

Metrics

Record views

418

Files downloads

132