R. S. Sutton, A. Koop, and D. Silver, On the role of tracking in stationary environments, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.871-878, 2007.
DOI : 10.1145/1273496.1273606

N. Kohl and P. Stone, Policy gradient reinforcement learning for fast quadrupedal locomotion, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004, pp.2619-2624, 2004.
DOI : 10.1109/ROBOT.2004.1307456

R. Tedrake, T. Zhang, and H. Seung, Stochastic policy gradient reinforcement learning on a simple 3D biped, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), pp.2849-2854, 2005.
DOI : 10.1109/IROS.2004.1389841

J. Peters and S. Schaal, Natural Actor-Critic, Neurocomputing, vol.717, issue.9, pp.1180-1190, 2008.
DOI : 10.1016/j.neucom.2007.11.026

P. Abbeel, A. Coates, and A. Y. Ng, Autonomous Helicopter Aerobatics through Apprenticeship Learning, The International Journal of Robotics Research, vol.29, issue.13, pp.1608-1639, 2010.
DOI : 10.1177/0278364910371999

M. Bowling and M. Veloso, Simultaneous Adversarial Multi-robot Learning, International Joint Conference on Artificial Intelligence, pp.699-704, 2003.

R. S. Sutton and S. D. Whitehead, Online Learning with Random Representations, Proceedings of the Tenth International Conference on Machine Learning, pp.314-321, 1993.
DOI : 10.1016/B978-1-55860-307-3.50047-2

S. Bhatnagar, R. S. Sutton, M. Ghavamzadeh, and M. Lee, Natural actor???critic algorithms, Automatica, vol.45, issue.11, pp.2471-2482, 2009.
DOI : 10.1016/j.automatica.2009.07.008

URL : https://hal.archives-ouvertes.fr/hal-00840470

R. Williams, Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Machine Learning, vol.8, issue.3, pp.229-256, 1992.
DOI : 10.1007/978-1-4615-3618-5_2

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

H. Kimura, T. Yamashita, and S. Kobayashi, Reinforcement learning of walking behavior for a four-legged robot, Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228), pp.411-416, 2002.
DOI : 10.1109/CDC.2001.980135

R. S. Sutton and A. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

R. S. Sutton, D. Mcallester, S. Singh, and Y. Mansour, Policy Gradient Methods for Reinforcement Learning with Function Approximation, Advances in Neural Information Processing Systems, pp.1057-1063, 2000.

R. S. Sutton, Learning to predict by the methods of temporal differences, Machine Learning, vol.34, issue.1, pp.9-44, 1988.
DOI : 10.1007/BF00115009

J. Baxter and P. Bartlett, Direct gradient-based reinforcement learning, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353), pp.271-274, 2002.
DOI : 10.1109/ISCAS.2000.856049

K. Doya, Reinforcement Learning in Continuous Time and Space, Neural Computation, vol.3, issue.1, pp.219-245, 2000.
DOI : 10.1109/9.580874

T. Tamei and T. Shibata, Fast Reinforcement Learning for Three-Dimensional Kinetic Human???Robot Cooperation with an EMG-to-Activation Model, Advanced Robotics, vol.15, issue.5, pp.563-580, 2011.
DOI : 10.1163/016918611X558252

P. M. Pilarski, M. R. Dawson, T. Degris, F. Fahimi, J. P. Carey et al., Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning, 2011 IEEE International Conference on Rehabilitation Robotics, pp.134-140, 2011.
DOI : 10.1109/ICORR.2011.5975338