A. Antos, C. Szepesvári, M. , and R. , Fitted-Q Iteration in Continuous Action-Space MDPs, Proc. of NIPS, pp.9-16, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00185311

M. Bowling and M. Veloso, Rational and Convergent Learning in Stochastic Games, Proc. of IJCAI, pp.1021-1026, 2001.

L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen, Classification and Regression Trees, 1984.

L. Busoniu, R. Babuska, D. Schutter, and B. , A Comprehensive Survey of Multiagent Reinforcement Learning, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol.38, issue.2, pp.156-172, 2008.
DOI : 10.1109/TSMCC.2007.913919

D. Ernst, P. Geurts, and L. Wehenkel, Tree-Based Batch Mode Reinforcement Learning, Journal of Machine Learning Research, pp.503-556, 2005.

A. Farahmand, C. Szepesvári, M. , and R. , Error Propagation for Approximate Policy and Value Iteration, Proc. of NIPS, pp.568-576, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00830154

V. Gabillon, A. Lazaric, M. Ghavamzadeh, and B. Scherrer, Classification-Based Policy Iteration with a Critic, Proc. of ICML, pp.1049-1056, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00590972

T. D. Hansen, P. B. Miltersen, and U. Zwick, Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor, Journal of the ACM, vol.60, issue.1, p.1, 2013.
DOI : 10.1145/2432622.2432623

J. Hu and M. P. Wellman, Nash Q-Learning for General- Sum Stochastic Games, JMLR, vol.4, pp.1039-1069, 2003.

N. Karmarkar, A New Polynomial-time Algorithm for Linear Programming, Proc. of ACM Symposium on Theory of Computing, pp.302-311, 1984.

M. G. Lagoudakis and R. Parr, Least-squares policy iteration, Journal of Machine Learning Research, pp.1107-1149, 2003.

M. G. Lagoudakis and R. Parr, Reinforcement Learning as Classification: Leveraging Modern Classifiers, Proc. of ICML, pp.424-431, 2003.

M. G. Lagoudakis and R. Parr, Value function approximation in zero-sum markov games, Proc. of UAI, pp.283-292, 2002.

A. Lazaric, M. Ghavamzadeh, and R. Munos, Analysis of a Classification-Based Policy Iteration Algorithm, Proc. of ICML, pp.607-614, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00482065

M. L. Littman, Markov games as a framework for multi-agent reinforcement learning, Proc. of ICML, pp.157-163, 1994.
DOI : 10.1016/B978-1-55860-335-6.50027-1

C. Meyer, . Ganascia, . Jean-gabriel, and J. Zucker, Learning strategies in games by anticipation, IJCAI 97, pp.698-707, 1997.

R. Munos and C. Szepesvári, Finite-time bounds for fitted value iteration, JMLR, vol.9, pp.815-857, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00120882

S. D. Patek, Stochastic Shortest Path Games, SIAM Journal on Control and Optimization, vol.37, issue.3, 1997.
DOI : 10.1137/S0363012996299557

M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1994.
DOI : 10.1002/9780470316887

B. Scherrer and B. Lesner, On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes, Proc. of NIPS, pp.1826-1834, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00758809

B. Scherrer, M. Ghavamzadeh, V. Gabillon, and M. Geist, Approximate Modified Policy Iteration, Proc. of ICML, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00758882

L. S. Shapley, . Stochastic, and . Games, Stochastic Games, Proceedings of the National Academy of Sciences, vol.39, issue.10, p.1095, 1953.
DOI : 10.1073/pnas.39.10.1953

J. Van-der-wal, Discounted Markov games: Generalized policy iteration method, Journal of Optimization Theory and Applications, vol.30, issue.1, pp.125-138, 1978.
DOI : 10.1007/BF00933260

V. Neumann and J. , Morgenstern, 0.(1944) theory of games and economic behavior, 1947.