Optimally solving Dec-POMDPs as Continuous-State MDPs: Theory and Algorithms

Jilles Steeve Dibangoye; Christopher Amato; Olivier Buffet; François Charpillet

Rapport (Rapport De Recherche) Année : 2014

Optimally solving Dec-POMDPs as Continuous-State MDPs: Theory and Algorithms

(1) , (2) , (1) , (1)

1
2

Jilles Steeve Dibangoye

Fonction : Auteur
PersonId : 4917
IdHAL : jilles-steeve-dibangoye
ORCID : 0000-0001-8826-4438
IdRef : 144368145

Autonomous intelligent machine

Christopher Amato

Fonction : Auteur
PersonId : 934158

Computer Science and Artificial Intelligence Laboratory [Cambridge]

Olivier Buffet

Fonction : Auteur
PersonId : 1407
IdHAL : olivier-buffet
ORCID : 0000-0002-5072-5857

Autonomous intelligent machine

François Charpillet

Fonction : Auteur
PersonId : 1910
IdHAL : francois-charpillet
ORCID : 0000-0001-8260-1536
IdRef : 070140553

Autonomous intelligent machine

Résumé

Decentralized partially observable Markov decision processes (Dec-POMDPs) provide a general model for decision-making under uncertainty in cooperative decentralized settings, but are difficult to solve optimally (NEXP-Complete). As a new way of solving these problems, we introduce the idea of transforming a Dec-POMDP into a continuous-state deterministic MDP with a piecewise-linear and convex value function. This approach makes use of the fact that planning can be accomplished in a centralized offline manner, while execution can still be distributed. This new Dec-POMDP formulation, which we call an occupancy MDP, allows powerful POMDP and continuous-state MDP methods to be used for the first time. When the curse of dimensionality becomes too prohibitive, we refine this basic approach and present ways to combine heuristic search and compact representations that exploit the structure present in multi-agent domains, without losing the ability to eventually converge to an optimal solution. In particular, we introduce feature-based heuristic search that relies on feature-based compact representations, point-based updates and efficient action selection. A theoretical analysis demonstrates that our feature-based heuristic search algorithms terminate in finite time with an optimal solution. We include an extensive empirical analysis using well known benchmarks, thereby demonstrating our approach provides significant scalability improvements compared to the state of the art.

Les processus de décision markoviens partiellement observables décentralisés (Dec-POMDP) fournissent un modèle général pour la prise de décision dans l'incertain dans des cadres coopératifs décentralisés. En guise de nouvelle approche de résolution de ces problèmes, nous introduisons l'idée de transformer un Dec-POMDP en un MDP déterministe à espace d'états continu dont la fonction de valeur est linéaire par morceaux et convexe. Cette approche exploite le fait que la planification peut être effectuée d'une manière centralisée hors ligne, alors que l'exécution peut toujours être distribuée. Cette nouvelle formulation des Dec-POMDP, que nous appelons un occupancy MDP, permet pour la première fois d'employer de puissantes méthodes de résolution de POMDP et MDP à états continus. La malédiction de la dimensionalité devenant prohibitive, nous raffinons cette approche basique et présentons des façons de combiner la recherche heuristique et des représentations compactes qui exploitent la structure présente dans les domaines multi-agents, sans perdre la capacité de converger à terme vers une solution optimale. En particulier, nous introduisons une recherche heuristique qui repose sur des représentations compactes fondées sur des features, sur des mises-à-jour à base de points, et une sélection d'action efficace. Une analyse théorique démontre que nos algorithmes de recherche heuristique fondés sur des features se terminent en temps fini avec une solution optimale. Nous incluons une analyse empirique extensive utilisant des bancs d'essai bien connus, démontrant ainsi que notre approche améliore significativement le passage à l'échelle en comparaison de l'état de l'art.

Mots clés

Decentralized Control Decentralized Partially Observable Markov Decision processes Dec-POMDPs automated planning multi-agent systems uncertainty

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

RR-8517.pdf (1.07 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Olivier Buffet : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00975802

Soumis le : mercredi 9 avril 2014-11:38:06

Dernière modification le : jeudi 1 février 2024-10:06:27

Archivage à long terme le : mercredi 9 juillet 2014-11:50:54

Dates et versions

hal-00975802 , version 1 (09-04-2014)

Identifiants

HAL Id : hal-00975802 , version 1

Citer

Jilles Steeve Dibangoye, Christopher Amato, Olivier Buffet, François Charpillet. Optimally solving Dec-POMDPs as Continuous-State MDPs: Theory and Algorithms. [Research Report] RR-8517, INRIA. 2014, pp.77. ⟨hal-00975802⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA IRISA INRIA-RRRT UNIV-LORRAINE INRIA2 LORIA LORIA-AIS LARA UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

279 Consultations

430 Téléchargements

Optimally solving Dec-POMDPs as Continuous-State MDPs: Theory and Algorithms

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager