J. A. Bagnell and J. Schneider, Covariant policy search, Proceeding of the International Joint Conference on Artifical Intelligence, 2003.

S. Bhatnagar, R. S. Sutton, M. Ghavamzadeh, and M. Lee, Incremental natural actor-critic algorithms, Neural Information Processing Systems 21, 2007.

T. Degris, P. M. Pilarski, and R. S. Sutton, Model-Free reinforcement learning with continuous action in practice, 2012 American Control Conference (ACC), 2012.
DOI : 10.1109/ACC.2012.6315022

URL : https://hal.archives-ouvertes.fr/hal-00764281

T. Degris, M. White, and R. S. Sutton, Linear off-policy actor-critic, 29th International Conference on Machine Learning, 2012.

Y. Engel, P. Szabó, and D. Volkinshtein, Learning to control an octopus arm with gaussian process temporal difference methods, Neural Information Processing Systems 18, 2005.

R. Hafner and M. Riedmiller, Reinforcement learning in feedback control, Machine Learning, pp.137-169, 2011.
DOI : 10.1007/s10994-011-5235-x

N. Heess, D. Silver, and Y. Teh, Actor-critic reinforcement learning with energy-based policies, Conference Proceedings: EWRL 2012, pp.43-58, 2012.

S. Kakade, A natural policy gradient, Neural Information Processing Systems 14, pp.1531-1538, 2001.

M. G. Lagoudakis and R. Parr, Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003.

H. R. Maei, C. Szepesvári, S. Bhatnagar, and R. S. Sutton, Toward off-policy learning control with function approximation, 27th International Conference on Machine Learning, pp.719-726, 2010.

J. Peters, Policy gradient methods, Scholarpedia, vol.5, issue.11, p.3698, 2010.
DOI : 10.4249/scholarpedia.3698

J. Peters, S. Vijayakumar, and S. Schaal, Natural actor-critic, 16th European Conference on Machine Learning, pp.280-291, 2005.
DOI : 10.1016/j.neucom.2007.11.026

R. Sutton and A. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

R. S. Sutton, H. R. Maei, D. Precup, S. Bhatnagar, D. Silver et al., Fast gradient-descent methods for temporal-difference learning with linear function approximation, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, p.125, 2009.
DOI : 10.1145/1553374.1553501

R. S. Sutton, D. A. Mcallester, S. P. Singh, and Y. Mansour, Policy gradient methods for reinforcement learning with function approximation, Neural Information Processing Systems 12, pp.1057-1063, 1999.

R. S. Sutton, S. P. Singh, and D. A. Mcallester, Comparing policy-gradient algorithms, 2000.

M. Toussaint, Some notes on gradient descent, 2012.

C. Watkins and P. Dayan, Q-learning, Machine Learning, pp.279-292, 1992.

P. J. Werbos, A menu of designs for reinforcement learning over time, Neural networks for control, pp.67-95, 1990.

R. J. Williams, Simple statistical gradientfollowing algorithms for connectionist reinforcement learning, Machine Learning, pp.229-256, 1992.

T. Zhao, H. Hachiya, G. Niu, and M. Sugiyama, Analysis and improvement of policy gradient estimation, Neural Networks, vol.26, pp.118-129, 2012.
DOI : 10.1016/j.neunet.2011.09.005