Approximate dynamic programming for two-player zero-sum Markov games

Abstract : This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in L p-norm of three well-known algorithms adapted to Stochastic Games (namely Approximate Value Iteration, Approximate Policy Iteration and Approximate Generalized Policy Iteratio,n). We show that we can achieve a stationary policy which is 2γ+ (1−γ) 2-optimal, where is the value function approximation error and is the approximate greedy operator error. In addition , we provide a practical algorithm (AGPI-Q) to solve infinite horizon γ-discounted two-player zero-sum Stochastic Games in a batch setting. It is an extension of the Fitted-Q algorithm (which solves Markov Decisions Processes from data) and can be non-parametric. Finally, we demonstrate experimentally the performance of AGPI-Q on a simultaneous two-player game, namely Alesia.
Complete list of metadatas

Cited literature [24 references]  Display  Hide  Download

https://hal.inria.fr/hal-01153270
Contributor : Olivier Pietquin <>
Submitted on : Wednesday, May 27, 2015 - 5:15:35 PM
Last modification on : Thursday, April 4, 2019 - 10:18:05 AM
Long-term archiving on : Monday, April 24, 2017 - 4:52:15 PM

File

ICML_2015_JPBSBPOP.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01153270, version 3

Citation

Julien Perolat, Bruno Scherrer, Bilal Piot, Olivier Pietquin. Approximate dynamic programming for two-player zero-sum Markov games. International Conference on Machine Learning (ICML 2015), Jul 2015, Lille, France. ⟨hal-01153270v3⟩

Share

Metrics

Record views

414

Files downloads

317