Approximate dynamic programming for two-player zero-sum Markov games - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

Approximate dynamic programming for two-player zero-sum Markov games

Résumé

This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in L p-norm of three well-known algorithms adapted to Stochastic Games (namely Approximate Value Iteration, Approximate Policy Iteration and Approximate Generalized Policy Iteratio,n). We show that we can achieve a stationary policy which is 2γ+ (1−γ) 2-optimal, where is the value function approximation error and is the approximate greedy operator error. In addition , we provide a practical algorithm (AGPI-Q) to solve infinite horizon γ-discounted two-player zero-sum Stochastic Games in a batch setting. It is an extension of the Fitted-Q algorithm (which solves Markov Decisions Processes from data) and can be non-parametric. Finally, we demonstrate experimentally the performance of AGPI-Q on a simultaneous two-player game, namely Alesia.
Fichier principal
Vignette du fichier
ICML_2015_JPBSBPOP.pdf (1.36 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01153270 , version 1 (21-05-2015)
hal-01153270 , version 2 (22-05-2015)
hal-01153270 , version 3 (27-05-2015)

Identifiants

  • HAL Id : hal-01153270 , version 3

Citer

Julien Perolat, Bruno Scherrer, Bilal Piot, Olivier Pietquin. Approximate dynamic programming for two-player zero-sum Markov games. International Conference on Machine Learning (ICML 2015), Jul 2015, Lille, France. ⟨hal-01153270v3⟩
347 Consultations
370 Téléchargements

Partager

Gmail Facebook X LinkedIn More