Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, pp.229-256, 1992. ,
DOI : 10.1007/978-1-4615-3618-5_2
Policy gradient methods for reinforcement learning with function approximation, Advances in Neural Information Processing Systems 12, pp.1057-1063, 2000. ,
Infinite-horizon policy-gradient estimation, Journal of Artificial Intelligence Research, vol.15, pp.319-350, 2001. ,
Policy Gradients with Parameter-Based Exploration for Control, Artificial Neural Networks -ICANN 2008, pp.387-396, 2008. ,
DOI : 10.1007/978-3-540-87536-9_40
Policy search for motor primitives in robotics, Advances in Neural Information Processing Systems 21, pp.849-856, 2008. ,
DOI : 10.1007/s10994-010-5223-6
URL : https://link.springer.com/content/pdf/10.1007%2Fs10994-010-5223-6.pdf
Natural Actor-Critic, Neurocomputing, vol.71, issue.7-9, pp.1180-1190, 2008. ,
DOI : 10.1016/j.neucom.2007.11.026
Relative entropy policy search, AAAI Conference on Artificial Intelligence 24, 2010. ,
A survey of actor-critic reinforcement learning: Standard and natural policy gradients. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, vol.42, issue.6, pp.1291-1307, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00756747
A Survey on Policy Search for Robotics, Foundations and Trends in Robotics, vol.2, issue.1-2, pp.1-142, 2013. ,
DOI : 10.1561/2300000021
Approximately optimal approximate reinforcement learning, International Conference on Machine Learning, pp.267-274, 2002. ,
Approximate policy iteration: A survey and some new methods, Journal of Control Theory and Applications, vol.9, issue.3, pp.310-335, 2011. ,
Approximate policy iteration schemes: A comparison, International Conference on Machine Learning 31 Conference Proceedings, pp.1314-1322, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00989982
Safe policy iteration, International Conference on Machine Learning Conference Proceedings, pp.307-315, 2013. ,
A fast and reliable policy improvement algorithm, Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pp.1338-1346, 2016. ,
Adaptive step-size for policy gradient methods, Advances in Neural Information Processing Systems 26, pp.1394-1402, 2013. ,
Policy gradient in Lipschitz Markov Decision Processes, Machine Learning, pp.255-283, 2015. ,
DOI : 10.1007/s10514-009-9132-0
URL : https://link.springer.com/content/pdf/10.1007%2Fs10994-015-5484-1.pdf
Trust region policy optimization, International Conference on Machine Learning Conference Proceedings, pp.1889-1897, 2015. ,
High confidence policy improvement, International Conference on Machine Learning Conference Proceedings, pp.2380-2388, 2015. ,
Safe policy improvement by minimizing robust baseline regret, pp.2298-2306, 2016. ,
Coordinate descent converges faster with the gauss-southwell rule than random selection, International Conference on Machine Learning Conference Proceedings, pp.1632-1641, 2015. ,
Analysis and improvement of policy gradient estimation, Neural Networks, vol.26, pp.118-129, 2012. ,
DOI : 10.1016/j.neunet.2011.09.005
Empirical Bernstein stopping, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.672-679, 2008. ,
DOI : 10.1145/1390156.1390241
URL : https://hal.archives-ouvertes.fr/hal-00834983
Reinforcement learning of motor skills with policy gradients, Neural Networks, vol.21, issue.4, pp.682-697, 2008. ,
DOI : 10.1016/j.neunet.2008.02.003
Information and Information Stability of Random Variables and Processes, Izv. Akad. Nauk, 1960. ,