M. Lagoudakis and R. Parr, Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003.

D. Ernst, P. Geurts, and L. Wehenkel, Tree-based batch mode reinforcement learning, Journal of Machine Learning Research, vol.6, pp.503-556, 2005.

. Cs, R. Szepesvári, and . Munos, Finite time bounds for sampling based fitted value iteration, ICML'2005, pp.881-886, 2005.

D. P. Bertsekas and S. E. Shreve, Stochastic Optimal Control (The Discrete Time Case), 1978.

B. Yu, Rates of convergence for empirical processes of stationary mixing sequences. The Annals of Probability, pp.94-116, 1994.

R. Meir, Nonparametric time series prediction through adaptive model selection, Machine Learning, pp.5-34, 2000.

A. Antos, C. Szepesvári, and R. Munos, Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, COLT-19, pp.574-588, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00830201

M. Anthony and P. L. Bartlett, Neural Network Learning: Theoretical Foundations, 1999.
DOI : 10.1017/CBO9780511624216

A. Antos, C. Szepesvári, and R. Munos, Approximate action-value iteration in continuous state spaces: learning with a single trajectory, 2006.

E. W. Cheney, Introduction to approximation theory, 1966.

L. Györfi, M. Kohler, A. Krzy?, and H. Walk, A distribution-free theory of nonparametric regression, 2002.
DOI : 10.1007/b97848

D. Haussler, Sphere packing numbers for subsets of the Boolean n-cube with bounded Vapnik-Chervonenkis dimension, Journal of Combinatorial Theory, Series A, vol.69, issue.2, pp.217-232, 1995.
DOI : 10.1016/0097-3165(95)90052-7

R. Munos, Error bounds for approximate policy iteration, ICML'2003, pp.560-567, 2003.

R. Munos and C. Szepesvári, Finite time bounds for sampling based fitted value iteration, Journal of Machine Learning Research, 2005.
URL : https://hal.archives-ouvertes.fr/inria-00120882

C. S. Chow and J. N. Tsitsiklis, An optimal multigrid algorithm for continuous state discrete time stochastic control, Proceedings of the 27th IEEE Conference on Decision and Control, pp.898-914, 1991.
DOI : 10.1109/CDC.1988.194660

P. Bougerol and N. Picard, Strict Stationarity of Generalized Autoregressive Processes, The Annals of Probability, vol.20, issue.4, pp.1714-1730, 1992.
DOI : 10.1214/aop/1176989526