Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games

Julien Perolat 1, 2, 3 Bruno Scherrer 4, 5 Bilal Piot 2, 6, 1 Olivier Pietquin 2, 3, 7, 1
1 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
4 Probabilités et statistiques
IECL - Institut Élie Cartan de Lorraine
5 BIGS - Biology, genetics and statistics
Inria Nancy - Grand Est, IECL - Institut Élie Cartan de Lorraine
Abstract : This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in L p-norm of three well-known algorithms adapted to Stochastic Games (namely Approximate Value Iteration, Approximate Policy Iteration and Approximate Generalized Policy Iteratio,n). We show that we can achieve a stationary policy which is 2γ+ (1−γ) 2-optimal, where is the value function approximation error and is the approximate greedy operator error. In addition , we provide a practical algorithm (AGPI-Q) to solve infinite horizon γ-discounted two-player zero-sum Stochastic Games in a batch setting. It is an extension of the Fitted-Q algorithm (which solves Markov Decisions Processes from data) and can be non-parametric. Finally, we demonstrate experimentally the performance of AGPI-Q on a simultaneous two-player game, namely Alesia.
Type de document :
Communication dans un congrès
International Conference on Machine Learning (ICML 2015), Jul 2015, Lille, France. 2015, 〈http://jmlr.org/proceedings/papers/v37/perolat15.pdf〉
Liste complète des métadonnées

Littérature citée [24 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01153270
Contributeur : Olivier Pietquin <>
Soumis le : mercredi 27 mai 2015 - 17:15:35
Dernière modification le : mardi 3 juillet 2018 - 11:29:41
Document(s) archivé(s) le : lundi 24 avril 2017 - 16:52:15

Fichier

ICML_2015_JPBSBPOP.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01153270, version 3

Citation

Julien Perolat, Bruno Scherrer, Bilal Piot, Olivier Pietquin. Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games. International Conference on Machine Learning (ICML 2015), Jul 2015, Lille, France. 2015, 〈http://jmlr.org/proceedings/papers/v37/perolat15.pdf〉. 〈hal-01153270v3〉

Partager

Métriques

Consultations de la notice

342

Téléchargements de fichiers

207