Skip to Main content Skip to Navigation
New interface
Conference papers

Approximate dynamic programming for two-player zero-sum Markov games

Abstract : This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in L p-norm of three well-known algorithms adapted to Stochastic Games (namely Approximate Value Iteration, Approximate Policy Iteration and Approximate Generalized Policy Iteratio,n). We show that we can achieve a stationary policy which is 2γ+ (1−γ) 2-optimal, where is the value function approximation error and is the approximate greedy operator error. In addition , we provide a practical algorithm (AGPI-Q) to solve infinite horizon γ-discounted two-player zero-sum Stochastic Games in a batch setting. It is an extension of the Fitted-Q algorithm (which solves Markov Decisions Processes from data) and can be non-parametric. Finally, we demonstrate experimentally the performance of AGPI-Q on a simultaneous two-player game, namely Alesia.
Complete list of metadata

Cited literature [24 references]  Display  Hide  Download
Contributor : Olivier Pietquin Connect in order to contact the contributor
Submitted on : Wednesday, May 27, 2015 - 5:15:35 PM
Last modification on : Tuesday, November 22, 2022 - 2:26:16 PM
Long-term archiving on: : Monday, April 24, 2017 - 4:52:15 PM


Files produced by the author(s)


  • HAL Id : hal-01153270, version 3


Julien Perolat, Bruno Scherrer, Bilal Piot, Olivier Pietquin. Approximate dynamic programming for two-player zero-sum Markov games. International Conference on Machine Learning (ICML 2015), Jul 2015, Lille, France. ⟨hal-01153270v3⟩



Record views


Files downloads