A. Antos, C. Szepesvári, and R. Munos, Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, COLT-19, pp.574-588, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00830201

A. Antos, C. Szepesvári, and R. Munos, Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00830201

A. Antos, C. Szepesvári, and R. Munos, Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp.330-337, 2007.
DOI : 10.1109/ADPRL.2007.368207
URL : https://hal.archives-ouvertes.fr/inria-00124833

D. P. Bertsekas and S. E. Shreve, Stochastic Optimal Control (The Discrete Time Case), 1978.

D. Ernst, P. Geurts, and L. Wehenkel, Tree-based batch mode reinforcement learning, Journal of Machine Learning Research, vol.6, pp.503-556, 2005.

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

N. Cristianini and J. Shawe-taylor, An introduction to support vector machines (and other kernel-based learning methods), 2000.
DOI : 10.1017/CBO9780511801389

J. A. Boyan and A. W. Moore, Generalization in reinforcement learning: Safely approximating the value function, NIPS-7, pp.369-376, 1995.

P. L. Bartlett, P. M. Long, and R. C. Williamson, Fat-Shattering and the Learnability of Real-Valued Functions, Journal of Computer and System Sciences, vol.52, issue.3, pp.434-452, 1996.
DOI : 10.1006/jcss.1996.0033

A. N. Kolmogorov and V. M. Tihomirov, ?-entropy and ?-capacity of sets in functional space, pp.277-364, 1961.

R. Munos and C. Szepesvári, Finite time bounds for sampling based fitted value iteration Computer and Automation Research Institute of the Hungarian Academy of Sciences, pp.13-17, 2006.

A. Y. Ng and M. Jordan, PEGASUS: A policy search method for large MDPs and POMDPs, Proceedings of the 16th Conference in Uncertainty in Artificial Intelligence, pp.406-415, 2000.

P. L. Bartlett and A. Tewari, Sample complexity of policy search with known dynamics, NIPS-19, 2007.

M. Anthony and P. L. Bartlett, Neural Network Learning: Theoretical Foundations, 1999.
DOI : 10.1017/CBO9780511624216

M. Riedmiller, Neural Fitted Q Iteration ??? First Experiences with a Data Efficient Neural Reinforcement Learning Method, 16th European Conference on Machine Learning, pp.317-328, 2005.
DOI : 10.1007/11564096_32

S. Kalyanakrishnan and P. Stone, Batch reinforcement learning in a complex domain, Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems , AAMAS '07, 2007.
DOI : 10.1145/1329125.1329241