Multiagent Reinforcement Learning: Algorithm Converging to Nash Equilibrium in General-Sum Discounted Stochastic Games, Proc. of AAMAS, 2009. ,
Adaptive policy gradient in multiagent learning, Proceedings of the second international joint conference on Autonomous agents and multiagent systems , AAMAS '03, 2003. ,
DOI : 10.1145/860575.860686
Stochastic approximation with controlled markovnoise, Systems & control letters, 2006. ,
Stochastic Approximation: A Dynamical Systems Viewpoint, 2009. ,
Stochastic approximation with two time scales, Systems & Control Letters, vol.29, issue.5, pp.291-294, 1997. ,
DOI : 10.1016/S0167-6911(97)90015-3
Stochastic approximation with two time scales, Systems & Control Letters, vol.29, issue.5, 1997. ,
DOI : 10.1016/S0167-6911(97)90015-3
Algorithms for computing strategies in two-player simultaneous move games, Artificial Intelligence, vol.237, pp.1-40, 2016. ,
DOI : 10.1016/j.artint.2016.03.005
Rational and Convergent Learning in Stochastic Games, Proc. of IJCAI, 2001. ,
Regret analysis of stochastic and nonstochastic multiarmed bandit problems, Machine Learning, pp.1-122, 2012. ,
Solving the Oshi-Zumo Game, pp.361-366, 2004. ,
DOI : 10.1007/978-0-387-35706-5_23
A Comprehensive Survey of Multiagent Reinforcement Learning, 12] Nicolo Cesa-Bianchi and Gabor Lugosi. Prediction , Learning, and Games, 2006. ,
DOI : 10.1109/TSMCC.2007.913919
Learning to communicate with deep multi-agent reinforcement learning, Proc. of NIPS, 2016. ,
Correlated Q-learning, Proc. of ICML, 2003. ,
Uncoupled dynamics do not lead to nash equilibrium, The American Economic Review, 2003. ,
DOI : 10.1142/9789814390705_0007
URL : http://www.ma.huji.ac.il/hart/papers/uncoupl.pdf
Fictitious self-play in extensive-form games, Proc. of ICML, 2015. ,
On the Global Convergence of Stochastic Fictitious Play, Econometrica, vol.70, issue.6, pp.2265-2294, 2002. ,
DOI : 10.1111/1468-0262.00376
Nash Q-Learning for General-Sum Stochastic Games, Journal of Machine Learning Research, vol.4, pp.1039-1069, 2003. ,
Lis`y. Analysis of hannan consistent selection for monte carlo tree search in simultaneous move games, 2015. ,
Monte carlo sampling for regret minimization in extensive games, Proc. of NIPS, 2009. ,
Julien Perolat, David Silver, and Thore Graepel. A unified game-theoretic approach to multiagent reinforcement learning, Proc. of NIPS, 2017. ,
The world of independent learners is not markovian, International Journal of Knowledge-based and Intelligent Engineering Systems, vol.15, issue.1, pp.55-64, 2011. ,
DOI : 10.3233/KES-2010-0206
URL : https://hal.archives-ouvertes.fr/hal-00601941
Generalised weakened fictitious play, Games and Economic Behavior, vol.56, issue.2, pp.285-298, 2006. ,
DOI : 10.1016/j.geb.2005.08.005
URL : http://www.maths.bris.ac.uk/~madsl/research/LeslieCollinsGEB06.pdf
Convergence of monte carlo tree search in simultaneous move games, Proc. of NIPS, 2013. ,
Markov games as a framework for multi-agent reinforcement learning, Proc. of ICML, 1994. ,
DOI : 10.1016/B978-1-55860-335-6.50027-1
URL : http://www.ee.duke.edu/~lcarin/emag/seminar_presentations/Markov_Games_Littman.pdf
Approximate Dynamic Programming for Two- Player Zero-Sum Markov Games, Proc. of ICML, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01153270
Two-Timescale Algorithms for Learning Nash Equilibria in General-Sum Stochastic Games, Proc. of AAMAS, 2015. ,
An Iterative Method of Solving a Game, The Annals of Mathematics, vol.54, issue.2, pp.296-301, 1951. ,
DOI : 10.2307/1969530
Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search, Proc. of ECML, 2014. ,
DOI : 10.1007/978-3-662-44845-8_3
URL : https://hal.archives-ouvertes.fr/hal-01091079
Some topics in two-person games Advances in game theory, 1964. ,
If multi-agent learning is the answer, what is the question?, Artificial Intelligence, vol.171, issue.7, pp.365-377, 2007. ,
DOI : 10.1016/j.artint.2006.02.006
Regret minimization in games with incomplete information, Proc. of NIPS, 2007. ,