Skip to Main content Skip to Navigation
Conference papers

Sur le principe d'optimalité de Bellman pour les zs-POSG

Olivier Buffet 1 Jilles Dibangoye 2 Aurélien Delage 1, 2 Abdallah Saffidine 3 Vincent Thomas 1
1 LARSEN - Lifelong Autonomy and interaction skills for Robots in a Sensing ENvironment
Inria Nancy - Grand Est, LORIA - AIS - Department of Complex Systems, Artificial Intelligence & Robotics
2 CHROMA - Robots coopératifs et adaptés à la présence humaine en environnements dynamiques
Inria Grenoble - Rhône-Alpes, CITI - CITI Centre of Innovation in Telecommunications and Integration of services
Abstract : Many non-trivial sequential decision-making problems are efficiently solved by relying on Bellman's optimality principle, i.e., exploiting the fact that sub-problems are nested recursively within the original problem. Here we show how it can apply to (infinite horizon) 2-player zero-sum partially observable stochastic games (zs-POSGs) by (i) taking a central planner's viewpoint, which can only reason on a sufficient statistic called occupancy state, and (ii) turning such problems into zero-sum occupancy Markov games (zs-OMGs). Then, exploiting the Lipschitz-continuity of the value function in occupancy space, one can derive a version of the HSVI algorithm (Heuristic Search Value Iteration) that provably finds an-Nash equilibrium in finite time.
Document type :
Conference papers
Complete list of metadata
Contributor : Olivier Buffet Connect in order to contact the contributor
Submitted on : Friday, December 18, 2020 - 10:10:23 AM
Last modification on : Thursday, January 20, 2022 - 5:26:12 PM
Long-term archiving on: : Friday, March 19, 2021 - 8:15:34 PM


Files produced by the author(s)


  • HAL Id : hal-03081320, version 1


Olivier Buffet, Jilles Dibangoye, Aurélien Delage, Abdallah Saffidine, Vincent Thomas. Sur le principe d'optimalité de Bellman pour les zs-POSG. JFPDA 2020 - Journées Francophones surla Planification, la Décision et l’Apprentissagepour la conduite de systèmes, Jun 2020, Angers (virtuel), France. pp.1-3. ⟨hal-03081320⟩



Les métriques sont temporairement indisponibles