Softened approximate policy iteration for Markov games - Archive ouverte HAL Access content directly
Conference Papers Year :

Softened approximate policy iteration for Markov games

(1, 2, 3) , (1, 2, 3) , (4) , (5, 6) , (1, 2, 3, 7)
1
2
3
4
5
6
7

Abstract

This paper reports theoretical and empirical investigations on the use of quasi-Newton methods to minimize the Optimal Bellman Residual (OBR) of zero-sum two-player Markov Games. First, it reveals that state-of-the-art algorithms can be derived by the direct application of New-ton's method to different norms of the OBR. More precisely, when applied to the norm of the OBR, Newton's method results in the Bellman Residual Minimization Policy Iteration (BRMPI) and, when applied to the norm of the Projected OBR (POBR), it results into the standard Least Squares Policy Iteration (LSPI) algorithm. Consequently , new algorithms are proposed, making use of quasi-Newton methods to minimize the OBR and the POBR so as to take benefit of enhanced empirical performances at low cost. Indeed , using a quasi-Newton method approach introduces slight modifications in term of coding of LSPI and BRMPI but improves significantly both the stability and the performance of those algorithms. These phenomena are illustrated on an experiment conducted on artificially constructed games called Garnets.
Fichier principal
Vignette du fichier
nmz.pdf (375.17 Ko) Télécharger le fichier
Vignette du fichier
Dir_Ns_100_Na_8_Nb_10_sparsity_0-5sample_49gamma_0-99.pdf (13.08 Ko) Télécharger le fichier
Vignette du fichier
Dir_Ns_100_Na_8_Nb_1_sparsity_0-5sample_49gamma_0-99.pdf (13.05 Ko) Télécharger le fichier
Vignette du fichier
Dir_Ns_50_Na_2_Nb_1_sparsity_0-3sample_1-0gamma_0-9.pdf (13.77 Ko) Télécharger le fichier
Vignette du fichier
Dir_Ns_50_Na_2_Nb_4_sparsity_0-3sample_2-0gamma_0-9.pdf (12.86 Ko) Télécharger le fichier
Vignette du fichier
icml_numpapers.eps (11.97 Ko) Télécharger le fichier
Vignette du fichier
icml_numpapers.pdf (2.75 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Origin : Files produced by the author(s)
Origin : Files produced by the author(s)
Origin : Files produced by the author(s)
Origin : Files produced by the author(s)
Origin : Files produced by the author(s)
Origin : Files produced by the author(s)

Dates and versions

hal-01393328 , version 1 (07-11-2016)

Identifiers

  • HAL Id : hal-01393328 , version 1

Cite

Julien Pérolat, Bilal Piot, Matthieu Geist, Bruno Scherrer, Olivier Pietquin. Softened approximate policy iteration for Markov games. ICML 2016 - 33rd International Conference on Machine Learning, Jun 2016, New York City, United States. ⟨hal-01393328⟩
538 View
318 Download

Share

Gmail Facebook Twitter LinkedIn More