Approximate dynamic programming for two-player zero-sum Markov games

This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in L p-norm of three well-known algorithms adapted to Stochastic Games (namely Approximate Value Iteration, Approximate Policy Iteration and Approximate Generalized Policy Iteratio,n). We show that we can achieve a stationary policy which is 2γ+ (1−γ) 2-optimal, where is the value function approximation error and is the approximate greedy operator error. In addition , we provide a practical algorithm (AGPI-Q) to solve infinite horizon γ-discounted two-player zero-sum Stochastic Games in a batch setting. It is an extension of the Fitted-Q algorithm (which solves Markov Decisions Processes from data) and can be non-parametric. Finally, we demonstrate experimentally the performance of AGPI-Q on a simultaneous two-player game, namely Alesia.

Domaines

Apprentissage [cs.LG] Mathématiques générales [math.GM] Recherche d'information [cs.IR]

Fichier principal

ICML_2015_JPBSBPOP.pdf (1.36 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Olivier Pietquin : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01153270

Soumis le : mercredi 27 mai 2015-17:15:35

Dernière modification le : mardi 16 avril 2024-03:07:48

Archivage à long terme le : lundi 24 avril 2017-16:52:15

Dates et versions

hal-01153270 , version 1 (21-05-2015)

hal-01153270 , version 2 (22-05-2015)

hal-01153270 , version 3 (27-05-2015)

Identifiants

HAL Id : hal-01153270 , version 3

Citer

Julien Perolat, Bruno Scherrer, Bilal Piot, Olivier Pietquin. Approximate dynamic programming for two-player zero-sum Markov games. International Conference on Machine Learning (ICML 2015), Jul 2015, Lille, France. ⟨hal-01153270v3⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UNIV-LILLE3 CNRS INRIA IRISA IECN CRISTAL UNIV-LORRAINE INRIA2 CRISTAL-SEQUEL UR1-MATH-STIC UR1-UFR-ISTIC IECLPS UNIV-RENNES UNIV-LILLE UR1-MATH-NUM

353 Consultations

372 Téléchargements