Learning to Act in Decentralized Partially Observable MDPs - Archive ouverte HAL Access content directly
Conference Papers Year :

Learning to Act in Decentralized Partially Observable MDPs

(1) , (2)
1
2

Abstract

We address a long-standing open problem of reinforcement learning in decentralized partially observable Markov decision processes. Previous attempts focussed on different forms of generalized policy iteration, which at best led to local optima. In this paper, we restrict attention to plans, which are simpler to store and update than policies. We derive, under certain conditions, the first near-optimal cooperative multi-agent reinforcement learning algorithm. To achieve significant scalability gains, we replace the greedy maximization by mixed-integer linear programming. Experiments show our approach can learn to act near-optimally in many finite domains from the literature.
Fichier principal
Vignette du fichier
paper.pdf (364.92 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-01851806 , version 1 (30-07-2018)

Identifiers

  • HAL Id : hal-01851806 , version 1

Cite

Jilles Dibangoye, Olivier Buffet. Learning to Act in Decentralized Partially Observable MDPs. ICML 2018 - 35th International Conference on Machine Learning, Jul 2018, Stockholm, Sweden. pp.1233-1242. ⟨hal-01851806⟩
231 View
105 Download

Share

Gmail Facebook Twitter LinkedIn More