On Bellman's Optimality Principle for zs-POSGs

Many non-trivial sequential decision-making problems are efficiently solved by relying on Bellman's optimality principle, i.e., exploiting the fact that sub-problems are nested recursively within the original problem. Here we show how it can apply to (infinite horizon) 2-player zero-sum partially observable stochastic games (zs-POSGs) by (i) taking a central planner's viewpoint, which can only reason on a sufficient statistic called occupancy state, and (ii) turning such problems into zero-sum occupancy Markov games (zs-OMGs). Then, exploiting the Lipschitz-continuity of the value function in occupancy space, one can derive a version of the HSVI algorithm (Heuristic Search Value Iteration) that provably finds an-Nash equilibrium in finite time.

Mots clés

POSG partially observable stochastic game POSG Bellman's optimality principle Heuristic Search Value Iteration

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

2006.16395.pdf (466.94 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Olivier Buffet : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03080287

Soumis le : vendredi 18 décembre 2020-10:58:09

Dernière modification le : lundi 11 septembre 2023-17:41:19

Archivage à long terme le : vendredi 19 mars 2021-18:14:43

Dates et versions

hal-03080287 , version 1 (18-12-2020)

Identifiants

HAL Id : hal-03080287 , version 1
ARXIV : 2006.16395v1

Citer

Olivier Buffet, Jilles Dibangoye, Aurélien Delage, Abdallah Saffidine, Vincent Thomas. On Bellman's Optimality Principle for zs-POSGs. 2020. ⟨hal-03080287⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA INSA-LYON UNIV-LORRAINE INRIA2 LORIA LORIA-AIS CITI INSA-GROUPE UDL ANR

87 Consultations

81 Téléchargements