N. Akchurina, Multi-Agent Reinforcement Learning Algorithms, 2010.

. Archibald, On the Generation of Markov Decision Processes, Journal of the Operational Research Society, vol.46, issue.3, pp.354-361, 1995.
DOI : 10.1057/jors.1995.50

. Baird, Residual Algorithms: Reinforcement Learning with Function Approximation, Proc. of ICML, 1995.
DOI : 10.1016/B978-1-55860-377-6.50013-X

V. Filar, J. Filar, and K. Vrieze, Competitive Markov Decision Processes, 2012.
DOI : 10.1007/978-1-4612-4054-9

. Goodfellow, Deep Learning. Book in preparation for, 2016.

. Grunewalder, Modelling Transition Dynamics in MDPs With RKHS Embeddings, Proc. of ICML, 2012.

W. Hu, J. Hu, and M. P. Wellman, Nash Q-Learning for General-Sum Stochastic Games, Journal of Machine Learning Research, vol.4, pp.1039-1069, 2003.

P. Lagoudakis, R. Parr, and R. Parr, Value Function Approximation in Zero-Sum Markov Games Reinforcement Learning as Classification: Leveraging Modern Classifiers, Proc. of UAI. [Lagoudakis and Parr Proc. of ICML, 2002.

. Lecun, Deep learning, Nature, vol.9, issue.7553, pp.436-444, 2015.
DOI : 10.1007/s10994-013-5335-x

. Lillicrap, Continuous Control with Deep Reinforcement Learning, Proc. of ICLR, 2016.

M. L. Littman, Friend-or-Foe Q-Learning in General-Sum Games, Proc. of ICML, 2001.

. Maillard, Finite- Sample Analysis of Bellman Residual Minimization, Proc. of ACML, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00830212

. Mnih, Human-level control through deep reinforcement learning, Nature, vol.101, issue.7540, pp.529-533, 2015.
DOI : 10.1016/S0004-3702(98)00023-X

. Munos, R. Szepesvári-]-munos, and C. Szepesvári, Finite-Time Bounds for Fitted Value Iteration, The Journal of Machine Learning Research, vol.9, pp.815-857, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00120882

. Nisan, Algorithmic Game Theory, 2007.
DOI : 10.1017/CBO9780511800481

. Pérolat, Softened Approximate Policy Iteration for Markov Games, Proc. of ICML, 2016.

. Perolat, On the use of nonstationary strategies for solving two-player zero-sum markov games, Proc. of AISTATS, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01291495

. Perolat, Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games, Proc. of ICML, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01153270

. Piot, Learning Nash Equilibrium for General-Sum Markov Games from Batch Data Boosted Bellman Residual Minimization Handling Expert Demonstrations, Proc. of ECML, 2014.
DOI : 10.1007/978-3-662-44851-9_35

URL : https://hal-supelec.archives-ouvertes.fr/hal-01060953/document/

. Piot, Difference of convex functions programming for reinforcement learning, Proc. of NIPS, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01104419

. Prasad, Two-Timescale Algorithms for Learning Nash Equilibria in General-Sum Stochastic Games, Proc. of AAMAS, 2015.

M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1994.
DOI : 10.1002/9780470316887

. Scherrer, Approximate Modified Policy Iteration, Proc. of ICML, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00758882

L. S. Shapley, Stochastic Games, Proc. of the National Academy of Sciences of the United States of America, 1953.

P. Taylor, G. Taylor, and R. Parr, Value Function Approximation in Noisy Environments Using Locally Smoothed Regularized Approximate Linear Programs, Proc. of UAI, 2012.

. Zinkevich, Cyclic Equilibria in Markov Games, Proc. of NIPS, 2006.