J. Baxter and P. L. Bartlett, Infinite-horizon gradient-based policy search, Journal of Artificial Intelligence Research, vol.15, pp.319-350, 2001.

A. Bensoussan, Perturbation methods in optimal control Wiley/Gauthier-Villars Series in Modern Applied Mathematics, 1988.

A. Bogdanov, Optimal control of a double inverted pendulum on a cart, CSEE, OGI School of Science and Engineering, 2004.

P. W. Glynn, Likelihood ratio gradient estimation: an overview, Proceedings of the 1987 Winter Simulation Conference, pp.366-375, 1987.

E. Gobet and R. Munos, Sensitivity Analysis Using It??--Malliavin Calculus and Martingales, and Application to Stochastic Optimal Control, SIAM Journal on Control and Optimization, vol.43, issue.5, pp.1676-1713, 2005.
DOI : 10.1137/S0363012902419059

P. E. Kloeden and E. Platen, Numerical Solutions of Stochastic Differential Equations, 1995.

H. J. Kushner and G. Yin, Stochastic Approximation Algorithms and Applications, 1997.
DOI : 10.1007/978-1-4899-2696-8

S. M. Lavalle, Planning Algorithms, 2006.
DOI : 10.1017/CBO9780511546877

M. Ledoux, The concentration of measure phenomenon, 2001.
DOI : 10.1090/surv/089

P. Marbach and J. N. Tsitsiklis, Approximate gradient methods in policy-space optimization of Markov reward processes, Discrete Event Dynamic Systems, vol.13, issue.1/2, pp.111-148, 2003.
DOI : 10.1023/A:1022145020786

B. T. Polyak, Introduction to Optimization. Optimization Software Inc, 1987.

M. I. Reiman and A. Weiss, Sensitivity analysis via likelihood ratios, Proceedings of the 18th conference on Winter simulation , WSC '86, pp.285-289, 1986.
DOI : 10.1145/318242.318450

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

R. S. Sutton, D. Mcallester, S. Singh, and Y. Mansour, Policy gradient methods for reinforcement learning with function approximation. Neural Information Processing Systems, POLICY GRADIENT IN CONTINUOUS TIME, pp.1057-1063, 2000.

M. Talagrand, A new look at independence. Annals of Probability, pp.1-34, 1996.

R. J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, pp.229-256, 1992.

J. Yang and H. J. Kushner, A Monte Carlo Method for Sensitivity Analysis and Parametric Optimization of Nonlinear Stochastic Systems, SIAM Journal on Control and Optimization, vol.29, issue.5, pp.1216-1249, 1991.
DOI : 10.1137/0329064