C. Amato, J. S. Dibangoye, and S. Zilberstein, Incremental Policy Generation for Finite-Horizon DEC- POMDPs, ICAPS, 2009.

D. S. Bernstein, S. Zilberstein, and N. Immerman, The Complexity of Decentralized Control of Markov Decision Processes, UAI, 2000.
DOI : 10.1287/moor.27.4.819.297

G. W. Brown, Iterative Solutions of Games by Fictitious Play, Activity Analysis of Production and Allocation, 1951.

J. S. Dibangoye, A. Mouaddib, and B. , Chaib-draa. Point-based incremental pruning heuristic for solving finite-horizon DEC-POMDPs, AAMAS, pp.569-576, 2009.

J. S. Dibangoye, C. Amato, O. Buffet, and F. Charpillet, Optimally Solving Dec-POMDPs As Continuousstate MDPs, IJCAI, pp.90-96, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00907338

J. S. Dibangoye, C. Amato, O. Buffet, and F. Charpillet, Optimally solving Dec-POMDPs as Continuous- State MDPs: Theory and Algorithms, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00975802

J. S. Dibangoye, C. Amato, O. Buffet, and F. Charpillet, Exploiting Separability in Multiagent Planning with Continuous-State MDPs (Extended Abstract), IJCAI, pp.4254-4260, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01188483

J. S. Dibangoye, O. Buffet, and O. Simonin, Structural Results for Cooperative Decentralized Control Models, IJCAI, pp.46-52, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01188481

J. S. Dibangoye, C. Amato, O. Buffet, and F. Charpillet, Optimally Solving Dec-POMDPs as Continuous- State MDPs, Journal of AI Research, vol.55, 2016.
URL : https://hal.archives-ouvertes.fr/hal-00907338

J. N. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, Counterfactual Multi-Agent Policy Gradients, 2017.

C. Guestrin, M. G. Lagoudakis, and R. Parr, Coordinated Reinforcement Learning, ICML, 2002.

E. A. Hansen, D. S. Bernstein, and S. Zilberstein, Dynamic Programming for Partially Observable Stochastic Games, AAAI, 2004.

R. A. Howard, Dynamic Programming and Markov Processes, 1960.

J. Hu and M. P. Wellman, Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm, ICML, 1998.

J. R. Kok and N. Vlassis, Sparse cooperative Q-learning, Twenty-first international conference on Machine learning , ICML '04, 2004.
DOI : 10.1145/1015330.1015410
URL : http://www.aicml.cs.ualberta.ca/banff04/icml/pages/papers/267.pdf

L. Kraemer and B. Banerjee, Multi-agent reinforcement learning as a rehearsal for decentralized planning, Neurocomputing, vol.190, 2016.
DOI : 10.1016/j.neucom.2016.01.031
URL : https://manuscript.elsevier.com/S0925231216000783/pdf/S0925231216000783.pdf

A. Kumar and S. Zilberstein, Constraint-based dynamic programming for decentralized POMDPs with structured interactions, AAMAS, 2009.

M. L. Littman, Markov games as a framework for multi-agent reinforcement learning, ICML, 1994.
DOI : 10.1016/B978-1-55860-335-6.50027-1
URL : http://www.ee.duke.edu/~lcarin/emag/seminar_presentations/Markov_Games_Littman.pdf

M. Liu, C. Amato, X. Liao, L. Carin, and J. P. How, Stick-breaking policy learning in Dec-POMDPs, IJCAI. AAAI, 2015.

M. Liu, C. Amato, E. P. Anesta, J. D. Griffith, and J. P. How, Learning for Decentralized Control of Multiagent Systems in Large, Partially-Observable Stochastic Environments, AAAI, 2016.

L. C. Macdermed and C. Isbell, Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs, NIPS, 2013.

T. Miconi, When Evolving Populations is Better Than Coevolving Individuals: The Blind Mice Problem, IJCAI, 2003.

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness et al., Human-level control through deep reinforcement learning, Nature, vol.101, issue.7540, p.518, 2015.
DOI : 10.1016/S0004-3702(98)00023-X

I. Mordatch and P. Abbeel, Emergence of Grounded Compositional Language in Multi-Agent Populations. CoRR, abs, 1703.

A. Nayyar, A. Mahajan, and D. Teneketzis, Optimal Control Strategies in Delayed Sharing Information Structures. Automatic Control, IEEE Transactions on, issue.7, p.56, 2011.
DOI : 10.1109/tac.2010.2089381
URL : http://arxiv.org/pdf/1002.4172

D. T. Nguyen, A. Kumar, and H. C. Lau, Policy gradient with value function approximation for collective multiagent planning, NIPS, pp.4319-4329

F. A. Oliehoek, Sufficient Plan-Time Statistics for Decentralized POMDPs, IJCAI, 2013.

F. A. Oliehoek, M. T. Spaan, and N. A. Vlassis, Optimal and Approximate Q-value Functions for Decentralized POMDPs, Journal of AI Research, vol.32, 2008.
DOI : 10.1145/1329125.1329390
URL : http://orbilu.uni.lu/bitstream/10993/11032/1/download.pdf

F. A. Oliehoek, M. T. Spaan, J. S. Dibangoye, and C. Amato, Heuristic search for identical payoff Bayesian games, AAMAS, pp.1115-1122, 2010.

F. A. Oliehoek, M. T. Spaan, C. Amato, and S. Whiteson, Incremental Clustering and Expansion for Faster Optimal Planning in Dec-POMDPs, Journal of AI Research, vol.46, 2013.

L. Panait and S. Luke, Cooperative Multi-Agent Learning: The State of the Art, Autonomous Agents and Multi-Agent Systems, vol.4, issue.2-3, 2005.
DOI : 10.1007/3-540-60923-7_20

L. Peshkin, K. Kim, N. Meuleau, and L. P. Kaelbling, Learning to Cooperate via Policy Search, UAI, 2000.

M. L. Puterman, Markov Decision Processes, Discrete Stochastic Dynamic Programming, 1994.

R. Radner, Team Decision Problems, The Annals of Mathematical Statistics, vol.33, issue.3, 1962.
DOI : 10.1214/aoms/1177704455
URL : http://doi.org/10.1214/aoms/1177704455

H. Robbins and S. Monro, A stochastic approximation method. The annals of mathematical statistics, 1951.

R. T. Rockafellar, Convex analysis, Princeton Mathematical Series. Princeton, N. J, 1970.
DOI : 10.1515/9781400873173

M. Rummery and G. A. Niranjan, On-line Q-learning using connectionist systems, 1994.

R. Salustowicz, M. Wiering, and J. Schmidhuber, Learning Team Strategies: Soccer Case Studies, ML, vol.33, issue.2-3, 1998.

G. Shani, J. Pineau, and R. Kaplow, A survey of point-based POMDP solvers, Autonomous Agents and Multi-Agent Systems, vol.17, issue.2, p.2013
DOI : 10.1016/j.csl.2006.06.008

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

D. Szer, F. Charpillet, and S. Zilberstein, MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs, UAI, 2005.
URL : https://hal.archives-ouvertes.fr/inria-00000204

M. Tan, Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents, Readings in Agents, 1998.
DOI : 10.1016/B978-1-55860-307-3.50049-6

F. Wu, S. Zilberstein, and N. R. Jennings, Monte-Carlo Expectation Maximization for Decentralized POMDPs, IJCAI, 2013.

C. Zhang and V. Lesser, Coordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs, AAAI Inria RESEARCH CENTRE GRENOBLE ? RHÔNE-ALPES Inovallée 655 avenue de l'Europe Montbonnot 38334 Saint Ismier Cedex Publisher Inria Domaine de Voluceau -Rocquencourt BP 105 -78153 Le Chesnay Cedex inria.fr ISSN, pp.249-6399, 2011.