S. I. Amari, Natural Gradient Works Efficiently in Learning, Neural Computation, vol.37, issue.2, 1998.
DOI : 10.1103/PhysRevLett.76.2188

K. J. Aström, Optimal Control of Markov Decision Processes with Incomplete State Estimation, Journal of Mathematical Analysis and Applications, vol.10, 1965.

R. E. Bellman, The Theory of Dynamic Programming, Bulletin of the American Mathematical Society, vol.60, issue.6, 1954.

D. S. Bernstein, R. Givan, N. Immerman, and S. Zilberstein, The Complexity of Decentralized Control of Markov Decision Processes, Mathematics of Operations Research, vol.27, issue.4, 2002.
DOI : 10.1287/moor.27.4.819.297

C. Boutilier, Planning, Learning and Coordination in Multiagent Decision Processes, Proc. of the Sixth Conf. on Theoretical Aspects of Rationality and Knowledge, 1996.

T. Degris, M. White, and R. S. Sutton, Linear off-policy actor-critic, Proc. of the 29th Int. Conf. on ML, ICML 2012, 2012.

J. S. Dibangoye, C. Amato, O. Buffet, and F. Charpillet, Optimally Solving Dec-POMDPs as Continuous-State MDPs, Journal of AI Research, vol.55, 2016.
URL : https://hal.archives-ouvertes.fr/hal-00907338

J. S. Dibangoye, C. Amato, O. Buffet, and F. Charpillet, Optimally Solving Dec-POMDPs As Continuous-state MDPs, Proc. of the Twenty-Fourth Int. Joint Conf. on AI, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00907338

J. S. Dibangoye, C. Amato, O. Buffet, and F. Charpillet, Exploiting Separability in Multi- Agent Planning with Continuous-State MDPs, Proc. of the Thirteenth Int. Conf. on Autonomous Agents and Multiagent Systems, 2014.

J. S. Dibangoye, C. Amato, O. Buffet, and F. Charpillet, Optimally solving Dec-POMDPs as Continuous-State MDPs: Theory and Algorithms, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00975802

J. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, Counterfactual multi-agent policy gradients, 2018.

J. K. Gupta, M. Egorov, and M. Kochenderfer, Cooperative Multi-agent Control Using Deep Reinforcement Learning, 2017.
DOI : 10.1109/TRA.2002.804040

E. A. Hansen, D. S. Bernstein, and S. Zilberstein, Dynamic Programming for Partially Observable Stochastic Games, Proc. of the Nineteenth National Conf. on AI, 2004.

S. Kakade, A Natural Policy Gradient, Advances in Neural Information Processing Systems, 2001.

V. R. Konda and J. N. Tsitsiklis, Actor-critic algorithms, Adv. in Neural Information Processing Systems, 2000.

L. Kraemer and B. Banerjee, Multi-agent reinforcement learning as a rehearsal for decentralized planning, Neurocomputing, vol.190, 2016.
DOI : 10.1016/j.neucom.2016.01.031

R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel et al., Multi-agent actor-critic for mixed cooperative-competitive environments, Adv. in Neural Information Processing Systems, 2017.

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness et al., Human-level control through deep reinforcement learning, Nature, vol.101, issue.7540, p.7540, 2015.
DOI : 10.1016/S0004-3702(98)00023-X

M. Morav?ík, M. Schmid, N. Burch, V. Lis´ylis´y, D. Morrill et al., DeepStack: Expert-level artificial intelligence in heads-up no-limit poker, Science, vol.29, issue.6337, 2017.
DOI : 10.1609/aimag.v31i4.2311

D. T. Nguyen, A. Kumar, and H. C. Lau, Policy gradient with value function approximation for collective multiagent planning, Adv. in Neural Information Processing Systems, 2017.

F. A. Oliehoek, M. T. Spaan, C. Amato, and S. Whiteson, Incremental Clustering and Expansion for Faster Optimal Planning in Dec-POMDPs, Journal of AI Research, vol.46, 2013.

L. Peshkin, K. E. Kim, N. Meuleau, and L. P. Kaelbling, Learning to Cooperate via Policy Search, Sixteenth Conf. on Uncertainty in Artificial Intelligence (UAI-2000), 2000.

H. Robbins and S. Monro, A stochastic approximation method. The annals of mathematical statistics, 1951.

Y. Shoham and K. Leyton-brown, Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations, 2008.
DOI : 10.1017/CBO9780511811654

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 2016.
DOI : 10.1109/TNN.1998.712192

R. S. Sutton, D. Mcallester, S. Singh, and Y. Mansour, Policy Gradient Methods for Reinforcement Learning with Function Approximation, Proc. of the 12th Int. Conf. on Neural Information Processing Systems, 1999.

D. Szer and F. Charpillet, An Optimal Best-First Search Algorithm for Solving Infinite Horizon DEC-POMDPs, Proc. of the Fifteenth European Conf. on ML, 2005.
DOI : 10.1007/11564096_38

URL : https://hal.archives-ouvertes.fr/inria-00000205

D. Szer, F. Charpillet, and S. Zilberstein, MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs, Proc. of the Twenty-First Conf. on Uncertainty in AI, 2005.
URL : https://hal.archives-ouvertes.fr/inria-00000204

M. Tan, Multi-agent Reinforcement Learning: Independent vs. Cooperative Agents. In: Readings in Agents, 1998.

R. J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, 1992.

F. Wu, S. Zilberstein, and N. R. Jennings, Monte-Carlo Expectation Maximization for Decentralized POMDPs, Proc. of the Twenty-Fourth Int. Joint Conf. on AI, 2013.

X. Zhang, D. Aberdeen, and S. V. Vishwanathan, Conditional random fields for multi-agent reinforcement learning, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.249-6399
DOI : 10.1145/1273496.1273640