Policy iteration algorithm for zero-sum multichain stochastic games with mean payoff and perfect information

Marianne Akian; Jean Cochet-Terrasson; Sylvie Detournay; Stéphane Gaubert

Pré-Publication, Document De Travail Année : 2012

Policy iteration algorithm for zero-sum multichain stochastic games with mean payoff and perfect information

(1, 2) , (3) , (1, 2) , (1, 2)

1
2
3

Marianne Akian

Fonction : Auteur
PersonId : 830429

Centre de Mathématiques Appliquées - Ecole Polytechnique

Max-plus algebras and mathematics of decision

Jean Cochet-Terrasson

Fonction : Auteur
PersonId : 935262

Contrôle général des armées

Sylvie Detournay

Fonction : Auteur
PersonId : 935260

Centre de Mathématiques Appliquées - Ecole Polytechnique

Max-plus algebras and mathematics of decision

Stéphane Gaubert

Fonction : Auteur
PersonId : 1887
IdHAL : stephane-gaubert
IdRef : 104895306

Centre de Mathématiques Appliquées - Ecole Polytechnique

Max-plus algebras and mathematics of decision

Résumé

We consider zero-sum stochastic games with finite state and action spaces, perfect information, mean payoff criteria, without any irreducibility assumption on the Markov chains associated to strategies (multichain games). The value of such a game can be characterized by a system of nonlinear equations, involving the mean payoff vector and an auxiliary vector (relative value or bias). We develop here a policy iteration algorithm for zero-sum stochastic games with mean payoff, following an idea of two of the authors (Cochet-Terrasson and Gaubert, C. R. Math. Acad. Sci. Paris, 2006). The algorithm relies on a notion of nonlinear spectral projection (Akian and Gaubert, Nonlinear Analysis TMA, 2003), which is analogous to the notion of reduction of super-harmonic functions in linear potential theory. To avoid cycling, at each degenerate iteration (in which the mean payoff vector is not improved), the new relative value is obtained by reducing the earlier one. We show that the sequence of values and relative values satisfies a lexicographical monotonicity property, which implies that the algorithm does terminate. We illustrate the algorithm by a mean-payoff version of Richman games (stochastic tug-of-war or discrete infinity Laplacian type equation), in which degenerate iterations are frequent. We report numerical experiments on large scale instances, arising from the latter games, as well as from monotone discretizations of a mean-payoff pursuit-evasion deterministic differential game.

Domaines

Informatique et théorie des jeux [cs.GT] Optimisation et contrôle [math.OC]

Marianne Akian : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00773080

Soumis le : vendredi 11 janvier 2013-15:40:40

Dernière modification le : jeudi 2 mai 2024-13:36:44

Dates et versions

hal-00773080 , version 1 (11-01-2013)

Identifiants

HAL Id : hal-00773080 , version 1
ARXIV : 1208.0446

Citer

Marianne Akian, Jean Cochet-Terrasson, Sylvie Detournay, Stéphane Gaubert. Policy iteration algorithm for zero-sum multichain stochastic games with mean payoff and perfect information. 2012. ⟨hal-00773080⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

X CNRS INRIA INSMI X-CMAP X-DEP-MATHA CMAP INRIA2 TDS-MACS

277 Consultations

0 Téléchargements

Policy iteration algorithm for zero-sum multichain stochastic games with mean payoff and perfect information

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager