S. I. Amari, Natural Gradient Works Efficiently in Learning, Neural Computation, vol.37, issue.2, 1998.
DOI : 10.1103/PhysRevLett.76.2188

C. Amato, J. S. Dibangoye, and S. Zilberstein, Incremental Policy Generation for Finite- Horizon DEC-POMDPs, Proc. of the Nineteenth Int. Conf. on Automated Planning and Scheduling, 2009.

K. J. Aström, Optimal Control of Markov Decision Processes with Incomplete State Estimation, Journal of Mathematical Analysis and Applications, vol.10, 1965.

R. E. Bellman, The Theory of Dynamic Programming, Bulletin of the American Mathematical Society, vol.60, issue.6, 1954.

D. S. Bernstein, R. Givan, N. Immerman, and S. Zilberstein, The Complexity of Decentralized Control of Markov Decision Processes, Mathematics of Operations Research, vol.27, issue.4, 2002.
DOI : 10.1287/moor.27.4.819.297

C. Boutilier, Planning, Learning and Coordination in Multiagent Decision Processes, Proc. of the Sixth Conf. on Theoretical Aspects of Rationality and Knowledge, 1996.

T. Degris, M. White, and R. S. Sutton, Linear off-policy actor-critic, Proc. of the 29th Int. Conf. on ML, ICML 2012, 2012.

J. S. Dibangoye, C. Amato, O. Buffet, and F. Charpillet, Optimally Solving Dec-POMDPs as Continuous-State MDPs, Journal of AI Research, vol.55, 2016.
URL : https://hal.archives-ouvertes.fr/hal-00907338

J. S. Dibangoye, C. Amato, O. Buffet, and F. Charpillet, Optimally Solving Dec-POMDPs As Continuous-state MDPs, Proc. of the Twenty-Fourth Int. Joint Conf. on AI, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00907338

J. S. Dibangoye, C. Amato, O. Buffet, and F. Charpillet, Optimally solving Dec-POMDPs as Continuous-State MDPs: Theory and Algorithms, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00975802

J. S. Dibangoye and O. Buffet, Learning to Act in Decentralized Partially Observable MDPs, Research report INRIA, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01851806

J. S. Dibangoye, O. Buffet, and F. Charpillet, Error-Bounded Approximations for Infinite-Horizon Discounted Decentralized POMDPs, Proc. of the Twenty-Fourth European Conf. on ML, 2014.
DOI : 10.1007/978-3-662-44848-9_22
URL : https://hal.archives-ouvertes.fr/hal-01096610

J. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, Counterfactual multi-agent policy gradients, 2018.

J. K. Gupta, M. Egorov, and M. Kochenderfer, Cooperative Multi-agent Control Using Deep Reinforcement Learning, 2017.
DOI : 10.1109/TRA.2002.804040

E. A. Hansen, D. S. Bernstein, and S. Zilberstein, Dynamic Programming for Partially Observable Stochastic Games, Proc. of the Nineteenth National Conf. on AI, 2004.

S. Kakade, A Natural Policy Gradient, Advances in Neural Information Processing Systems, 2001.

V. R. Konda and J. N. Tsitsiklis, Actor-critic algorithms, Adv. in Neural Information Processing Systems, 2000.

L. Kraemer and B. Banerjee, Multi-agent reinforcement learning as a rehearsal for decentralized planning, Neurocomputing, vol.190, 2016.
DOI : 10.1016/j.neucom.2016.01.031
URL : https://manuscript.elsevier.com/S0925231216000783/pdf/S0925231216000783.pdf

M. Liu, C. Amato, E. P. Anesta, J. D. Griffith, and J. P. How, Learning for Decentralized Control of Multiagent Systems in Large, Partially-Observable Stochastic Environments, p.AAAI, 2016.

M. Liu, C. Amato, X. Liao, L. Carin, and J. P. How, Stick-breaking policy learning in Dec-POMDPs. In: Int. Joint Conf. on AI (IJCAI) 2015. AAAI, 2015.

R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel et al., Multi-agent actor-critic for mixed cooperative-competitive environments, Adv. in Neural Information Processing Systems, 2017.

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness et al., Human-level control through deep reinforcement learning, Nature, vol.101, issue.7540, p.7540, 2015.
DOI : 10.1016/S0004-3702(98)00023-X

M. Morav?ík, M. Schmid, N. Burch, V. Lis´ylis´y, D. Morrill et al., DeepStack: Expert-level artificial intelligence in heads-up no-limit poker, Science, vol.29, issue.6337, 2017.
DOI : 10.1609/aimag.v31i4.2311

D. T. Nguyen, A. Kumar, and H. C. Lau, Policy gradient with value function approximation for collective multiagent planning, Adv. in Neural Information Processing Systems, 2017.

F. A. Oliehoek, M. T. Spaan, C. Amato, and S. Whiteson, Incremental Clustering and Expansion for Faster Optimal Planning in Dec-POMDPs, Journal of AI Research, vol.46, 2013.

F. A. Oliehoek, M. T. Spaan, J. S. Dibangoye, and C. Amato, Heuristic search for identical payoff Bayesian games, Proc. of the Ninth Int. Conf. on Autonomous Agents and Multiagent Systems, 2010.

L. Peshkin, K. E. Kim, N. Meuleau, and L. P. Kaelbling, Learning to Cooperate via Policy Search, Sixteenth Conf. on Uncertainty in Artificial Intelligence (UAI-2000), 2000.

H. Robbins and S. Monro, A stochastic approximation method. The annals of mathematical statistics, 1951.

Y. Shoham and K. Leyton-brown, Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations, 2008.
DOI : 10.1017/CBO9780511811654

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 2016.
DOI : 10.1109/TNN.1998.712192

R. S. Sutton, D. Mcallester, S. Singh, and Y. Mansour, Policy Gradient Methods for Reinforcement Learning with Function Approximation, Proc. of the 12th Int. Conf. on Neural Information Processing Systems, 1999.

D. Szer and F. Charpillet, An Optimal Best-First Search Algorithm for Solving Infinite Horizon DEC-POMDPs, Proc. of the Fifteenth European Conf. on ML, 2005.
DOI : 10.1007/11564096_38
URL : https://hal.archives-ouvertes.fr/inria-00000205

D. Szer, F. Charpillet, and S. Zilberstein, MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs, Proc. of the Twenty-First Conf. on Uncertainty in AI, 2005.
URL : https://hal.archives-ouvertes.fr/inria-00000204

M. Tan, Multi-agent Reinforcement Learning: Independent vs. Cooperative Agents. In: Readings in Agents, 1998.

R. J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, 1992.
DOI : 10.1007/978-1-4615-3618-5_2
URL : http://www.cs.ualberta.ca/~sutton/williams-92.pdf

F. Wu, S. Zilberstein, and N. R. Jennings, Monte-Carlo Expectation Maximization for Decentralized POMDPs, Proc. of the Twenty-Fourth Int. Joint Conf. on AI, 2013.

X. Zhang, D. Aberdeen, and S. V. Vishwanathan, Conditional random fields for multi-agent reinforcement learning, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.249-6399
DOI : 10.1145/1273496.1273640
URL : http://www.stat.purdue.edu/~vishy/papers/ZhaAbeVis07.pdf