N. Akchurina, Multiagent Reinforcement Learning: Algorithm Converging to Nash Equilibrium in General-Sum Discounted Stochastic Games, Proc. of AAMAS, 2009.

B. Banerjee and J. Peng, Adaptive policy gradient in multiagent learning, Proceedings of the second international joint conference on Autonomous agents and multiagent systems , AAMAS '03, 2003.
DOI : 10.1145/860575.860686

S. Vivek and . Borkar, Stochastic approximation with controlled markovnoise, Systems & control letters, 2006.

S. Vivek and . Borkar, Stochastic Approximation: A Dynamical Systems Viewpoint, 2009.

V. S. Borkar, Stochastic approximation with two time scales, Systems & Control Letters, vol.29, issue.5, pp.291-294, 1997.
DOI : 10.1016/S0167-6911(97)90015-3

V. S. Borkar, Stochastic approximation with two time scales, Systems & Control Letters, vol.29, issue.5, 1997.
DOI : 10.1016/S0167-6911(97)90015-3

B. Bo?ansk´bo?ansk´y, V. Lis´ylis´y, M. Lanctot, J. Cermák, and M. H. Winands, Algorithms for computing strategies in two-player simultaneous move games, Artificial Intelligence, vol.237, pp.1-40, 2016.
DOI : 10.1016/j.artint.2016.03.005

M. Bowling and M. Veloso, Rational and Convergent Learning in Stochastic Games, Proc. of IJCAI, 2001.

S. Bubeck and N. Cesa-bianchi, Regret analysis of stochastic and nonstochastic multiarmed bandit problems, Machine Learning, pp.1-122, 2012.

M. Buro, Solving the Oshi-Zumo Game, pp.361-366, 2004.
DOI : 10.1007/978-0-387-35706-5_23

L. Busoniu, R. Babuska, and B. Schutter, A Comprehensive Survey of Multiagent Reinforcement Learning, 12] Nicolo Cesa-Bianchi and Gabor Lugosi. Prediction , Learning, and Games, 2006.
DOI : 10.1109/TSMCC.2007.913919

J. N. Foerster, Y. M. Assael, S. Nando-de-freitas, and . Whiteson, Learning to communicate with deep multi-agent reinforcement learning, Proc. of NIPS, 2016.

A. Greenwald, K. Hall, and R. Serrano, Correlated Q-learning, Proc. of ICML, 2003.

S. Hart and A. Mas, Uncoupled dynamics do not lead to nash equilibrium, The American Economic Review, 2003.
DOI : 10.1142/9789814390705_0007

URL : http://www.ma.huji.ac.il/hart/papers/uncoupl.pdf

J. Heinrich, M. Lanctot, and D. Silver, Fictitious self-play in extensive-form games, Proc. of ICML, 2015.

J. Hofbauer and W. H. Sandholm, On the Global Convergence of Stochastic Fictitious Play, Econometrica, vol.70, issue.6, pp.2265-2294, 2002.
DOI : 10.1111/1468-0262.00376

J. Hu and M. P. Wellman, Nash Q-Learning for General-Sum Stochastic Games, Journal of Machine Learning Research, vol.4, pp.1039-1069, 2003.

V. Kova?ík and V. Lis, Lis`y. Analysis of hannan consistent selection for monte carlo tree search in simultaneous move games, 2015.

M. Lanctot, K. Waugh, M. Zinkevich, and M. Bowling, Monte carlo sampling for regret minimization in extensive games, Proc. of NIPS, 2009.

M. Lanctot, V. Zambaldi, A. Gruslys, and A. Lazaridou, Julien Perolat, David Silver, and Thore Graepel. A unified game-theoretic approach to multiagent reinforcement learning, Proc. of NIPS, 2017.

G. J. Laurent, L. Matignon, and N. L. Fort-piat, The world of independent learners is not markovian, International Journal of Knowledge-based and Intelligent Engineering Systems, vol.15, issue.1, pp.55-64, 2011.
DOI : 10.3233/KES-2010-0206

URL : https://hal.archives-ouvertes.fr/hal-00601941

D. S. Leslie and E. J. Collins, Generalised weakened fictitious play, Games and Economic Behavior, vol.56, issue.2, pp.285-298, 2006.
DOI : 10.1016/j.geb.2005.08.005

URL : http://www.maths.bris.ac.uk/~madsl/research/LeslieCollinsGEB06.pdf

V. Lisy, V. Kovarik, M. Lanctot, and B. Bosansky, Convergence of monte carlo tree search in simultaneous move games, Proc. of NIPS, 2013.

M. L. Littman, Markov games as a framework for multi-agent reinforcement learning, Proc. of ICML, 1994.
DOI : 10.1016/B978-1-55860-335-6.50027-1

URL : http://www.ee.duke.edu/~lcarin/emag/seminar_presentations/Markov_Games_Littman.pdf

J. Perolat, B. Scherrer, B. Piot, and O. Pietquin, Approximate Dynamic Programming for Two- Player Zero-Sum Markov Games, Proc. of ICML, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01153270

H. Prasad, L. Prashanth, and S. Bhatnagar, Two-Timescale Algorithms for Learning Nash Equilibria in General-Sum Stochastic Games, Proc. of AAMAS, 2015.

J. Robinson, An Iterative Method of Solving a Game, The Annals of Mathematics, vol.54, issue.2, pp.296-301, 1951.
DOI : 10.2307/1969530

B. Scherrer and M. Geist, Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search, Proc. of ECML, 2014.
DOI : 10.1007/978-3-662-44845-8_3

URL : https://hal.archives-ouvertes.fr/hal-01091079

L. S. Shapley, Some topics in two-person games Advances in game theory, 1964.

Y. Shoham, R. Powers, and T. Grenager, If multi-agent learning is the answer, what is the question?, Artificial Intelligence, vol.171, issue.7, pp.365-377, 2007.
DOI : 10.1016/j.artint.2006.02.006

M. Zinkevich, M. Johanson, H. Michael, C. Bowling, and . Piccione, Regret minimization in games with incomplete information, Proc. of NIPS, 2007.