J. A. Bagnell, S. M. Kakade, J. G. Schneider, and A. Y. Ng, Policy Search by Dynamic Programming, Advances in Neural Information Processing Systems 16, 2004.

R. E. Bellman, Dynamic Programming, 1957.

D. S. Bernstein, S. Zilberstein, and N. Immerman, The Complexity of Decentralized Control of Markov Decision Processes, Proc. of the Sixteenth Conf. on Uncertainty in AI, 2000.
DOI : 10.1287/moor.27.4.819.297

G. W. Brown, Iterative Solutions of Games by Fictitious Play, Activity Analysis of Production and Allocation, 1951.

J. S. Dibangoye, C. Amato, O. Buffet, and F. Charpillet, Optimally Solving Dec-POMDPs as Continuous-State MDPs, Journal of AI Research, vol.55, 2016.
URL : https://hal.archives-ouvertes.fr/hal-00907338

J. N. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, Counterfactual Multi-Agent Policy Gradients, 2017.

C. Guestrin, M. G. Lagoudakis, and R. Parr, Coordinated Reinforcement Learning, Proc. of the Eighteenth Int. Conf. on ML, 2002.

E. A. Hansen, D. S. Bernstein, and S. Zilberstein, Dynamic Programming for Partially Observable Stochastic Games

, Proc. of the Nineteenth National Conf. on AI, 2004.

R. A. Howard, Dynamic Programming and Markov Processes, 1960.

J. Hu and M. P. Wellman, Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm, Proc. of the Fifteenth Int. Conf. on ML, 1998.

J. R. Kok and N. Vlassis, Sparse Cooperative Q-learning

, Proc. of the Twentieth Int. Conf. on ML, 2004.

L. Kraemer and B. Banerjee, Multi-agent reinforcement learning as a rehearsal for decentralized planning, Neurocomputing, vol.190, pp.82-94, 2016.
DOI : 10.1016/j.neucom.2016.01.031

A. Kumar and S. Zilberstein, Constraint-based dynamic programming for decentralized POMDPs with structured interactions, Proc. of the Eighth Int. Conf. on Autonomous Agents and Multiagent Systems, 2009.

M. L. Littman, Markov games as a framework for multiagent reinforcement learning, Proc. of the Eleventh Int. Conf. on ML, 1994.

M. Liu, C. Amato, X. Liao, L. Carin, and J. P. How, Stickbreaking policy learning in Dec-POMDPs, Int. Joint Conf. on AI (IJCAI) 2015. AAAI, 2015.

M. Liu, C. Amato, E. P. Anesta, J. D. Griffith, and J. P. How, Learning for Decentralized Control of Multiagent Systems in Large, Partially-Observable Stochastic Environments, AAAI, 2016.

L. C. Macdermed and C. Isbell, Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs, Advances in Neural Information Processing Systems 26, 2013.

T. Miconi, When Evolving Populations is Better Than Coevolving Individuals: The Blind Mice Problem, Proc. of the 18th Int. Joint Conf. on AI, IJCAI'03, 2003.

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness et al., Human-level control through deep reinforcement learning, Nature, vol.101, issue.7540, p.518, 2015.
DOI : 10.1016/S0004-3702(98)00023-X

I. Mordatch and P. Abbeel, Emergence of Grounded Compositional Language in Multi-Agent Populations. CoRR, abs, 1703.

A. Nayyar, A. Mahajan, and D. Teneketzis, Optimal Control Strategies in Delayed Sharing Information Structures. Automatic Control, IEEE Transactions on, issue.7, p.56, 2011.

F. A. Oliehoek, Sufficient Plan-Time Statistics for Decentralized POMDPs, Proc. of the Twenty-Fourth Int. Joint Conf. on AI, 2013.

F. A. Oliehoek, M. T. Spaan, and N. A. Vlassis, Optimal and Approximate Q-value Functions for Decentralized POMDPs, Journal of Artificial Intelligence Research, vol.32, 2008.
DOI : 10.1613/jair.2447

F. A. Oliehoek, M. T. Spaan, C. Amato, and S. Whiteson, Incremental Clustering and Expansion for Faster Optimal Planning in Dec-POMDPs, Journal of AI Research, vol.46, 2013.

L. Panait and S. Luke, Cooperative Multi-Agent Learning: The State of the Art, Autonomous Agents and Multi-Agent Systems, vol.4, issue.2-3, 2005.
DOI : 10.1007/3-540-60923-7_20

L. Peshkin, K. Kim, N. Meuleau, and L. P. Kaelbling, Learning to Cooperate via Policy Search, Sixteenth Conf. on Uncertainty in Artificial Intelligence (UAI- 2000), 2000.

J. M. Porta, N. Vlassis, M. T. Spaan, and P. Poupart, Point-Based Value Iteration for Continuous POMDPs, J. Mach. Learn. Res, vol.7, 2006.

M. L. Puterman, Markov Decision Processes, Discrete Stochastic Dynamic Programming, 1994.

R. Radner, Team Decision Problems, The Annals of Mathematical Statistics, vol.33, issue.3, 1962.
DOI : 10.1214/aoms/1177704455

H. Robbins and S. Monro, A stochastic approximation method. The annals of mathematical statistics, 1951.

R. T. Rockafellar, Convex analysis, Princeton Mathematical Series. Princeton, N. J, 1970.
DOI : 10.1515/9781400873173

M. Rummery and G. A. Niranjan, On-line Q-learning using connectionist systems, 1994.

R. Salustowicz, M. Wiering, and J. Schmidhuber, Learning Team Strategies: Soccer Case Studies, ML, vol.33, issue.2-3, 1998.

G. Shani, J. Pineau, and R. Kaplow, A survey of pointbased POMDP solvers, Journal of Autonomous Agents and Multi-Agent Systems, vol.27, issue.1, p.2013
DOI : 10.1007/s10458-012-9200-2

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

D. Szer, F. Charpillet, and S. Zilberstein, MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs
URL : https://hal.archives-ouvertes.fr/inria-00000204

, Proc. of the Twenty-First Conf. on Uncertainty in AI, 2005.

M. Tan, Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents, Readings in Agents, 1998.
DOI : 10.1016/B978-1-55860-307-3.50049-6

C. J. Watkins and P. Dayan, , 1992.

F. Wu, S. Zilberstein, and N. R. Jennings,

, Proc. of the Twenty-Fourth Int. Joint Conf. on AI, 2013.

C. Zhang and V. Lesser,

, Proc. of the Twenty-Fifth AAAI Conf. on AI, 2011.