C. Amato, J. S. Dibangoye, and S. Zilberstein, Incremental Policy Generation for Finite-Horizon DEC-POMDPs, ICAPS, 2009.

J. A. Bagnell, S. M. Kakade, J. G. Schneider, and A. Ng, Policy Search by Dynamic Programming. In NIPS, 2004.

R. E. Bellman, . Dynamic, and . Programming, , 1957.

D. S. Bernstein, S. Zilberstein, and N. Immerman, The Complexity of Decentralized Control of Markov Decision Processes, UAI, 2000.
DOI : 10.1287/moor.27.4.819.297

G. Bono, J. S. Dibangoye, L. Matignon, F. Pereyron, and O. Simonin, On the Study of Cooperative Multi-Agent Policy Gradient, URL, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01821677

G. W. Brown, Iterative Solutions of Games by Fictitious Play, Activity Analysis of Production and Allocation, 1951.

J. S. Dibangoye and O. Buffet, Learning to Act in Decentralized Partially Observable MDPs Research report , INRIA Grenoble -Rhone-Alpes -CHROMA Team, URL, 2018.

J. S. Dibangoye, A. Mouaddib, C. , and B. , Pointbased incremental pruning heuristic for solving finitehorizon DEC-POMDPs, AAMAS, pp.569-576, 2009.

J. S. Dibangoye, C. Amato, O. Buffet, C. , and F. , Optimally Solving Dec-POMDPs as Continuous-State MDPs, IJCAI, pp.90-96, 2013.
DOI : 10.1613/jair.4623
URL : https://hal.archives-ouvertes.fr/hal-00907338

J. S. Dibangoye, C. Amato, O. Buffet, C. , and F. , Optimally Solving Dec-POMDPs as Continuous-State MDPs, Journal of Artificial Intelligence Research, vol.55, 2014.
DOI : 10.1613/jair.4623
URL : https://hal.archives-ouvertes.fr/hal-00907338

J. S. Dibangoye, O. Buffet, C. , and F. , Error-Bounded Approximations for Infinite-Horizon Discounted Decentralized POMDPs, ECML, pp.338-353, 2014.
DOI : 10.1007/978-3-662-44848-9_22
URL : https://hal.archives-ouvertes.fr/hal-01096610

J. S. Dibangoye, O. Buffet, and O. Simonin, Structural Results for Cooperative Decentralized Control Models, IJCAI, pp.46-52, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01188481

J. S. Dibangoye, C. Amato, O. Buffet, C. , and F. , Optimally Solving Dec-POMDPs as Continuous-State MDPs, Journal of Artificial Intelligence Research, vol.55, 2016.
DOI : 10.1613/jair.4623
URL : https://hal.archives-ouvertes.fr/hal-00907338

J. N. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, Counterfactual Multi-Agent Policy Gradients, 2017.

C. Guestrin, M. G. Lagoudakis, and R. Parr, Coordinated Reinforcement Learning, ICML, 2002.

E. A. Hansen, D. S. Bernstein, and S. Zilberstein, Dynamic Programming for Partially Observable Stochastic Games, AAAI, 2004.

R. A. Howard, Dynamic Programming and Markov Processes, 1960.

J. Hu and M. P. Wellman, Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm, ICML, 1998.

J. R. Kok and N. Vlassis, Sparse cooperative Q-learning, Twenty-first international conference on Machine learning , ICML '04, 2004.
DOI : 10.1145/1015330.1015410

L. Kraemer and B. Banerjee, Multi-agent reinforcement learning as a rehearsal for decentralized planning, Neurocomputing, vol.190, 2016.
DOI : 10.1016/j.neucom.2016.01.031

A. Kumar and S. Zilberstein, Constraint-based dynamic programming for decentralized POMDPs with structured interactions, AAMAS, 2009.

M. L. Littman, Markov games as a framework for multiagent reinforcement learning, ICML, 1994.

M. Liu, C. Amato, X. Liao, L. Carin, and J. P. How, Stickbreaking policy learning in Dec-POMDPs. In IJCAI. AAAI, 2015.

M. Liu, C. Amato, E. P. Anesta, J. D. Griffith, and J. P. How, Learning for Decentralized Control of Multiagent Systems in Large, Partially-Observable Stochastic Environments, AAAI, 2016.

L. C. Macdermed and C. Isbell, Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs, NIPS, 2013.

T. Miconi, When Evolving Populations is Better Than Coevolving Individuals: The Blind Mice Problem, IJCAI, 2003.

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness et al., Human-level control through deep reinforcement learning, Nature, vol.101, issue.7540, p.518, 2015.
DOI : 10.1016/S0004-3702(98)00023-X

, Learning to Act in Decentralized Partially Observable MDPs Mordatch, I. and Abbeel, P. Emergence of Grounded Compositional Language in Multi-Agent Populations, 1703.

A. Nayyar, A. Mahajan, and D. Teneketzis, Optimal Control Strategies in Delayed Sharing Information Structures. Automatic Control, IEEE Transactions on, issue.7, p.56, 2011.

D. T. Nguyen, A. Kumar, and H. Lau, Policy gradient with value function approximation for collective multiagent planning, NIPS, pp.4319-4329

F. A. Oliehoek, Sufficient Plan-Time Statistics for Decentralized POMDPs, IJCAI, 2013.

F. A. Oliehoek, M. T. Spaan, and N. A. Vlassis, Optimal and Approximate Q-value Functions for Decentralized POMDPs, Journal of Artificial Intelligence Research, vol.32, 2008.
DOI : 10.1613/jair.2447
URL : http://orbilu.uni.lu/bitstream/10993/11026/1/live-2447-3856-jair.pdf

F. A. Oliehoek, M. T. Spaan, J. S. Dibangoye, and C. Amato, Heuristic search for identical payoff Bayesian games, AAMAS, pp.1115-1122, 2010.

F. A. Oliehoek, M. T. Spaan, C. Amato, and S. Whiteson, Incremental Clustering and Expansion for Faster Optimal Planning in Dec-POMDPs, Journal of Artificial Intelligence Research, vol.46, 2013.
DOI : 10.1613/jair.3804
URL : https://jair.org/index.php/jair/article/download/10806/25794

L. Panait and S. Luke, Cooperative Multi-Agent Learning: The State of the Art, Autonomous Agents and Multi-Agent Systems, vol.4, issue.2-3, 2005.
DOI : 10.1007/3-540-60923-7_20

L. Peshkin, K. Kim, N. Meuleau, and L. P. Kaelbling, Learning to Cooperate via Policy Search, UAI, 2000.

M. L. Puterman, Markov Decision Processes, Discrete Stochastic Dynamic Programming, 1994.

R. Radner, Team Decision Problems, The Annals of Mathematical Statistics, vol.33, issue.3, 1962.
DOI : 10.1214/aoms/1177704455
URL : https://doi.org/10.1214/aoms/1177704455

H. Robbins and S. Monro, A stochastic approximation method. The annals of mathematical statistics, 1951.

G. A. Rummery and M. Niranjan, On-line Q-learning using connectionist systems, 1994.

R. Salustowicz, M. Wiering, and J. Schmidhuber, Learning Team Strategies: Soccer Case Studies, ML, vol.33, issue.2-3, 1998.

G. Shani, J. Pineau, and R. Kaplow, A survey of pointbased POMDP solvers, Journal of Autonomous Agents and Multi-Agent Systems, vol.27, issue.1, p.2013

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

D. Szer, F. Charpillet, S. Zilberstein, and . Maa-*, A Heuristic Search Algorithm for Solving Decentralized POMDPs, UAI, 2005.
URL : https://hal.archives-ouvertes.fr/inria-00000204

M. Tan, Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents, Readings in Agents, 1998.
DOI : 10.1016/B978-1-55860-307-3.50049-6

C. J. Watkins and P. Dayan, , 1992.

F. Wu, S. Zilberstein, J. , and N. R. , Monte-Carlo Expectation Maximization for Decentralized POMDPs, IJCAI, 2013.

C. Zhang and V. Lesser, Coordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs, AAAI, 2011.