A. Antos, C. Szepesvári, and R. Munos, Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, vol.22, issue.1, pp.89-129, 2008.
DOI : 10.1007/s10994-007-5038-2

URL : https://hal.archives-ouvertes.fr/hal-00830201

L. Baird, Residual Algorithms: Reinforcement Learning with Function Approximation, Proceedings of the Twelfth International Conference on Machine Learning, pp.30-37, 1995.
DOI : 10.1016/B978-1-55860-377-6.50013-X

D. Bertsekas, Dynamic Programming and Optimal Control, volume II, Athena Scientific, 2007.

D. Bertsekas and J. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, 1996.
DOI : 10.1007/0-306-48332-7_333

J. Boyan, Least-squares temporal difference learning, Proceedings of the 16th International Conference on Machine Learning, pp.49-56, 1999.

S. Bradtke and A. Barto, Linear least-squares algorithms for temporal difference learning, Machine Learning, pp.33-57, 1996.
DOI : 10.1007/978-0-585-33656-5_4

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.143.857

H. Victor, . De, G. Peña, and . Pang, Exponential inequalities for self-normlized processes with applications, Electronic Communications in Probability, vol.14, pp.372-381, 2009.

H. Victor, M. J. De-la-peña, T. Klass, and . Lai, Pseudo-maximization and self-normalized processes, Propability Surveys, vol.4, pp.172-192, 2007.

L. Györfi, M. Kohler, A. Krzy?, and H. Walk, A distribution-free theory of nonparametric regression, 2002.
DOI : 10.1007/b97848

M. Lagoudakis and R. Parr, Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003.

A. Lazaric, M. Ghavamzadeh, and R. Munos, Finite-sample analysis of lstd, Proceedings of the 27th International Conference on Machine Learning, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00482189

R. Meir, Nonparametric time series prediction through adaptive model selection, Machine Learning, pp.5-34, 2000.

B. Scherrer, Should one compute the temporal difference fix point or minimize the bellman residual? the unified oblique projection view, Proceedings of the 27th International Conference on Machine Learning, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00537403

P. J. Schweitzer and A. Seidmann, Generalized polynomial approximations in Markovian decision processes, Journal of Mathematical Analysis and Applications, vol.110, issue.2, pp.568-582, 1985.
DOI : 10.1016/0022-247X(85)90317-8

R. Sutton and A. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

M. Talagrand, The Generic Chaining: Upper and Lower Bounds of Stochastic Processes, 2005.

J. Tsitsiklis and B. Van-roy, An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control, vol.42, issue.5, pp.674-690, 1997.
DOI : 10.1109/9.580874

J. Tsitsiklis and B. Van-roy, Average cost temporal-difference learning, Automatica, vol.35, issue.11, pp.1799-1808, 1999.
DOI : 10.1016/S0005-1098(99)00099-0

B. Yu, Rates of convergence for empirical processes of stationary mixing sequences. The Annals of Probability, pp.94-116, 1994.

H. Yu, Convergence of least squares temporal difference methods under general conditions, Proceedings of the 27th International Conference on Machine Learning, 2010.

H. Yu and D. P. Bertsekas, Error Bounds for Approximations from Projected Linear Equations, Mathematics of Operations Research, vol.35, issue.2, pp.306-329, 2010.
DOI : 10.1287/moor.1100.0441