Natural Gradient Works Efficiently in Learning, Neural Computation, vol.37, issue.2, 1998. ,
DOI : 10.1103/PhysRevLett.76.2188
Incremental Policy Generation for Finite- Horizon DEC-POMDPs, Proc. of the Nineteenth Int. Conf. on Automated Planning and Scheduling, 2009. ,
Optimal Control of Markov Decision Processes with Incomplete State Estimation, Journal of Mathematical Analysis and Applications, vol.10, 1965. ,
The Theory of Dynamic Programming, Bulletin of the American Mathematical Society, vol.60, issue.6, 1954. ,
The Complexity of Decentralized Control of Markov Decision Processes, Mathematics of Operations Research, vol.27, issue.4, 2002. ,
DOI : 10.1287/moor.27.4.819.297
Planning, Learning and Coordination in Multiagent Decision Processes, Proc. of the Sixth Conf. on Theoretical Aspects of Rationality and Knowledge, 1996. ,
Linear off-policy actor-critic, Proc. of the 29th Int. Conf. on ML, ICML 2012, 2012. ,
Optimally Solving Dec-POMDPs as Continuous-State MDPs, Journal of AI Research, vol.55, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-00907338
Optimally Solving Dec-POMDPs As Continuous-state MDPs, Proc. of the Twenty-Fourth Int. Joint Conf. on AI, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00907338
Optimally solving Dec-POMDPs as Continuous-State MDPs: Theory and Algorithms, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00975802
Learning to Act in Decentralized Partially Observable MDPs, Research report INRIA, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01851806
Error-Bounded Approximations for Infinite-Horizon Discounted Decentralized POMDPs, Proc. of the Twenty-Fourth European Conf. on ML, 2014. ,
DOI : 10.1007/978-3-662-44848-9_22
URL : https://hal.archives-ouvertes.fr/hal-01096610
Counterfactual multi-agent policy gradients, 2018. ,
Cooperative Multi-agent Control Using Deep Reinforcement Learning, 2017. ,
DOI : 10.1109/TRA.2002.804040
Dynamic Programming for Partially Observable Stochastic Games, Proc. of the Nineteenth National Conf. on AI, 2004. ,
A Natural Policy Gradient, Advances in Neural Information Processing Systems, 2001. ,
Actor-critic algorithms, Adv. in Neural Information Processing Systems, 2000. ,
Multi-agent reinforcement learning as a rehearsal for decentralized planning, Neurocomputing, vol.190, 2016. ,
DOI : 10.1016/j.neucom.2016.01.031
URL : https://manuscript.elsevier.com/S0925231216000783/pdf/S0925231216000783.pdf
Learning for Decentralized Control of Multiagent Systems in Large, Partially-Observable Stochastic Environments, p.AAAI, 2016. ,
, Stick-breaking policy learning in Dec-POMDPs. In: Int. Joint Conf. on AI (IJCAI) 2015. AAAI, 2015.
Multi-agent actor-critic for mixed cooperative-competitive environments, Adv. in Neural Information Processing Systems, 2017. ,
Human-level control through deep reinforcement learning, Nature, vol.101, issue.7540, p.7540, 2015. ,
DOI : 10.1016/S0004-3702(98)00023-X
DeepStack: Expert-level artificial intelligence in heads-up no-limit poker, Science, vol.29, issue.6337, 2017. ,
DOI : 10.1609/aimag.v31i4.2311
Policy gradient with value function approximation for collective multiagent planning, Adv. in Neural Information Processing Systems, 2017. ,
Incremental Clustering and Expansion for Faster Optimal Planning in Dec-POMDPs, Journal of AI Research, vol.46, 2013. ,
Heuristic search for identical payoff Bayesian games, Proc. of the Ninth Int. Conf. on Autonomous Agents and Multiagent Systems, 2010. ,
Learning to Cooperate via Policy Search, Sixteenth Conf. on Uncertainty in Artificial Intelligence (UAI-2000), 2000. ,
A stochastic approximation method. The annals of mathematical statistics, 1951. ,
Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations, 2008. ,
DOI : 10.1017/CBO9780511811654
Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 2016. ,
DOI : 10.1109/TNN.1998.712192
Policy Gradient Methods for Reinforcement Learning with Function Approximation, Proc. of the 12th Int. Conf. on Neural Information Processing Systems, 1999. ,
An Optimal Best-First Search Algorithm for Solving Infinite Horizon DEC-POMDPs, Proc. of the Fifteenth European Conf. on ML, 2005. ,
DOI : 10.1007/11564096_38
URL : https://hal.archives-ouvertes.fr/inria-00000205
MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs, Proc. of the Twenty-First Conf. on Uncertainty in AI, 2005. ,
URL : https://hal.archives-ouvertes.fr/inria-00000204
Multi-agent Reinforcement Learning: Independent vs. Cooperative Agents. In: Readings in Agents, 1998. ,
Simple statistical gradient-following algorithms for connectionist reinforcement learning, 1992. ,
DOI : 10.1007/978-1-4615-3618-5_2
URL : http://www.cs.ualberta.ca/~sutton/williams-92.pdf
Monte-Carlo Expectation Maximization for Decentralized POMDPs, Proc. of the Twenty-Fourth Int. Joint Conf. on AI, 2013. ,
Conditional random fields for multi-agent reinforcement learning, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.249-6399 ,
DOI : 10.1145/1273496.1273640
URL : http://www.stat.purdue.edu/~vishy/papers/ZhaAbeVis07.pdf