A. Shun-ichi, Natural Gradient Works Efficiently in Learning, Neural Comput, vol.10, issue.2, pp.899-7667, 1998.

J. Karl and . Aström, Optimal Control of Markov Decision Processes with Incomplete State Estimation

, Journal of Mathematical Analysis and Applications, vol.10, 1965.

E. Richard and . Bellman, The Theory of Dynamic Programming, In : Bulletin of the American Mathematical Society, vol.60, p.6, 1954.

S. Daniel and . Bernstein, The Complexity of Decentralized Control of Markov Decision Processes, In : Mathematics of Operations Research, vol.27, issue.4, 2002.

B. Craig, Planning, Learning and Coordination in Multiagent Decision Processes, Proc. of the Sixth Conf. on Theoretical Aspects of Rationality and Knowledge, 1996.

D. Thomas, W. Martha, and R. S. Sutton, Linear Off-Policy Actor-Critic, Proc. of the 29th Int. Conf. on ML, ICML 2012

J. Steeve and D. , Exploiting Separability in Multi-Agent Planning with Continuous- State MDPs, Proc. of the Thirteenth Int. Conf. on Autonomous Agents and Multiagent Systems, 2014.

J. Steeve and D. , Optimally Solving Dec-POMDPs As Continuous-state MDPs, Proc. of the Twenty-Fourth Int. Joint Conf. on AI, 2013.

J. Steeve and D. , Optimally solving Dec-POMDPs as Continuous-State MDPs : Theory and Algorithms

S. Jilles and . Dibangoye, Optimally Solving Dec- POMDPs as Continuous-State MDPs, Journal of AI Research, vol.55, 2016.

J. N. Foerster, Counterfactual Multi- Agent Policy Gradients, 2017.

K. Jayesh, . Gupta, E. Maxim, and K. Mykel, Cooperative Multi-agent Control Using Deep Reinforcement Learning, Autonomous Agents and Multiagent Systems, 2017.

A. Eric, . Hansen, S. Daniel, . Bernstein, and Z. Shlomo, Dynamic Programming for Partially Observable Stochastic Games, Proc. of the Nineteenth National Conf. on AI, 2004.

K. Sham, A Natural Policy Gradient, Advances in Neural Information Processing Systems, 2001.

R. Vijay, J. N. Konda, and . Tsitsiklis, Actor- Critic Algorithms, Advances in Neural Information Processing Systems 12, 2000.

K. Landon and B. Bikramjit, Multiagent reinforcement learning as a rehearsal for decentralized planning, Neurocomputing, vol.190, 2016.

L. Ryan, Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments, Advances in Neural Information Processing Systems 30, 2017.

M. Volodymyr, Human-level control through deep reinforcement learning, Nature, vol.5187540, pp.28-0836

M. Matej, DeepStack : Expert-level artificial intelligence in heads-up no-limit poker, In : Science, vol.356, p.6337, 2017.

A. Frans and . Oliehoek, Incremental Clustering and Expansion for Faster Optimal Planning in Dec- POMDPs, Journal of AI Research, vol.46, 2013.

P. Leonid, Learning to Cooperate via Policy Search, Sixteenth Conf. on Uncertainty in Artificial Intelligence (UAI-2000, 2000.

H. Robbins and S. Monro, A stochastic approximation method, The annals of mathematical statistics, 1951.

S. Yoav and L. Kevin, Multiagent Systems : Algorithmic, Game-Theoretic, and Logical Foundations, 2008.

S. Richard, . Sutton, G. Andrew, and . Barto, Introduction to Reinforcement Learning. 2nd, 2016.

S. Richard and . Sutton, Policy Gradient Methods for Reinforcement Learning with Function Approximation, Proc. of the 12th Int. Conf. on Neural Information Processing Systems, 1999.

S. Daniel and C. François, An Optimal Best-First Search Algorithm for Solving Infinite Horizon DEC-POMDPs, Proc. of the Fifteenth European Conf. on ML, 2005.

S. Daniel, C. François, and Z. Shlomo, MAA* : A Heuristic Search Algorithm for Solving Decentralized POMDPs, Proc. of the Twenty-First Conf. on Uncertainty in AI, 2005.

T. Ming, Multi-agent Reinforcement Learning : Independent vs. Cooperative Agents " . In : Readings in Agents, 1998.

J. Ronald and . Williams, Simple statistical gradientfollowing algorithms for connectionist reinforcement learning, 1992.

W. Feng, Z. Shlomo, R. Nicholas, and . Jennings, Monte-Carlo Expectation Maximization for Decentralized POMDPs, Proc. of the Twenty-Fourth Int. Joint Conf. on AI, 2013.

Z. Xinhua, A. Douglas, and S. V. Vishwanathan, Conditional Random Fields for Multi-agent Reinforcement Learning, Proc. of the 24th international conference on Machine learning, 2007.