A. Y. Ng and M. Jordan, PEGASUS: A policy search method for large MDPs and POMDPs, Proceedings of the 16th Conference in Uncertainty in Artificial Intelligence, pp.406-415, 2000.

L. Peshkin and C. R. Shelton, Learning from scarce experience, ICML, pp.498-505, 2002.

D. Aberdeen, Policy-gradient methods for planning, Advances in Neural Information Processing Systems 18, pp.9-16, 2006.

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

M. Lagoudakis and R. Parr, Least-squares policy iteration, Journal of Machine Learning Research, vol.4, issue.2, pp.1107-1149, 2003.

A. Antos, C. Szepesvári, and R. Munos, Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, COLT-19, pp.574-588, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00830201

A. Antos, C. Szepesvári, and R. Munos, Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, pp.16-18, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00830201

D. Ernst, P. Geurts, and L. Wehenkel, Tree-based batch mode reinforcement learning, Journal of Machine Learning Research, vol.6, pp.503-556, 2004.

J. A. Boyan and A. W. Moore, Generalization in reinforcement learning: Safely approximating the value function, NIPS-7, pp.369-376, 1995.

G. J. Gordon, Stable Function Approximation in Dynamic Programming, Proc. of ICML 20, pp.261-268, 1995.
DOI : 10.1016/B978-1-55860-377-6.50040-2

D. Ormoneit and S. Sen, Kernel-based reinforcement learning, Machine Learning, pp.161-178, 2002.

M. Riedmiller, Neural Fitted Q Iteration ??? First Experiences with a Data Efficient Neural Reinforcement Learning Method, 16th European Conference on Machine Learning, pp.317-328, 2005.
DOI : 10.1007/11564096_32

S. Kalyanakrishnan and P. Stone, Batch reinforcement learning in a complex domain, Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems , AAMAS '07, p.10, 2007.
DOI : 10.1145/1329125.1329241

A. Antos, C. Szepesvári, and R. Munos, Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp.330-337, 2007.
DOI : 10.1109/ADPRL.2007.368207
URL : https://hal.archives-ouvertes.fr/inria-00124833

D. P. Bertsekas and S. E. Shreve, Stochastic Optimal Control (The Discrete Time Case), 1978.

N. Cristianini and J. Shawe-taylor, An introduction to support vector machines (and other kernel-based learning methods), 2000.
DOI : 10.1017/CBO9780511801389

P. L. Bartlett, P. M. Long, and R. C. Williamson, Fat-Shattering and the Learnability of Real-Valued Functions, Journal of Computer and System Sciences, vol.52, issue.3, pp.434-452, 1996.
DOI : 10.1006/jcss.1996.0033