Natural Gradient Works Efficiently in Learning, Neural Comput, vol.10, issue.2, pp.899-7667, 1998. ,
Optimal Control of Markov Decision Processes with Incomplete State Estimation ,
, Journal of Mathematical Analysis and Applications, vol.10, 1965.
The Theory of Dynamic Programming, In : Bulletin of the American Mathematical Society, vol.60, p.6, 1954. ,
The Complexity of Decentralized Control of Markov Decision Processes, In : Mathematics of Operations Research, vol.27, issue.4, 2002. ,
Planning, Learning and Coordination in Multiagent Decision Processes, Proc. of the Sixth Conf. on Theoretical Aspects of Rationality and Knowledge, 1996. ,
Linear Off-Policy Actor-Critic, Proc. of the 29th Int. Conf. on ML, ICML 2012 ,
Exploiting Separability in Multi-Agent Planning with Continuous- State MDPs, Proc. of the Thirteenth Int. Conf. on Autonomous Agents and Multiagent Systems, 2014. ,
Optimally Solving Dec-POMDPs As Continuous-state MDPs, Proc. of the Twenty-Fourth Int. Joint Conf. on AI, 2013. ,
Optimally solving Dec-POMDPs as Continuous-State MDPs : Theory and Algorithms ,
Optimally Solving Dec- POMDPs as Continuous-State MDPs, Journal of AI Research, vol.55, 2016. ,
Counterfactual Multi- Agent Policy Gradients, 2017. ,
Cooperative Multi-agent Control Using Deep Reinforcement Learning, Autonomous Agents and Multiagent Systems, 2017. ,
Dynamic Programming for Partially Observable Stochastic Games, Proc. of the Nineteenth National Conf. on AI, 2004. ,
A Natural Policy Gradient, Advances in Neural Information Processing Systems, 2001. ,
Actor- Critic Algorithms, Advances in Neural Information Processing Systems 12, 2000. ,
Multiagent reinforcement learning as a rehearsal for decentralized planning, Neurocomputing, vol.190, 2016. ,
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments, Advances in Neural Information Processing Systems 30, 2017. ,
Human-level control through deep reinforcement learning, Nature, vol.5187540, pp.28-0836 ,
DeepStack : Expert-level artificial intelligence in heads-up no-limit poker, In : Science, vol.356, p.6337, 2017. ,
Incremental Clustering and Expansion for Faster Optimal Planning in Dec- POMDPs, Journal of AI Research, vol.46, 2013. ,
Learning to Cooperate via Policy Search, Sixteenth Conf. on Uncertainty in Artificial Intelligence (UAI-2000, 2000. ,
A stochastic approximation method, The annals of mathematical statistics, 1951. ,
Multiagent Systems : Algorithmic, Game-Theoretic, and Logical Foundations, 2008. ,
Introduction to Reinforcement Learning. 2nd, 2016. ,
Policy Gradient Methods for Reinforcement Learning with Function Approximation, Proc. of the 12th Int. Conf. on Neural Information Processing Systems, 1999. ,
An Optimal Best-First Search Algorithm for Solving Infinite Horizon DEC-POMDPs, Proc. of the Fifteenth European Conf. on ML, 2005. ,
MAA* : A Heuristic Search Algorithm for Solving Decentralized POMDPs, Proc. of the Twenty-First Conf. on Uncertainty in AI, 2005. ,
Multi-agent Reinforcement Learning : Independent vs. Cooperative Agents " . In : Readings in Agents, 1998. ,
Simple statistical gradientfollowing algorithms for connectionist reinforcement learning, 1992. ,
Monte-Carlo Expectation Maximization for Decentralized POMDPs, Proc. of the Twenty-Fourth Int. Joint Conf. on AI, 2013. ,
Conditional Random Fields for Multi-agent Reinforcement Learning, Proc. of the 24th international conference on Machine learning, 2007. ,