M. Lagoudakis and R. Parr, Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003.

A. Antos, C. Szepesvári, and R. Munos, Learning near-optimal policies with fitted policy iteration and a single sample path: approximate iterative policy evaluation, 2006.

D. P. Bertsekas and S. E. Shreve, Stochastic Optimal Control (The Discrete Time Case), 1978.

R. S. Sutton and A. G. Barto, Toward a modern theory of adaptive networks: Expectation and prediction., Proc. of the Ninth Annual Conference of Cognitive Science Society, 1987.
DOI : 10.1037/0033-295X.88.2.135

R. Munos, Error bounds for approximate policy iteration, 19th International Conference on Machine Learning, pp.560-567, 2003.

S. P. Meyn and R. Tweedie, Markov Chains and Stochastic Stability, 1993.

M. Anthony and P. L. Bartlett, Neural Network Learning: Theoretical Foundations, 1999.
DOI : 10.1017/CBO9780511624216

B. Yu, Rates of convergence for empirical processes of stationary mixing sequences. The Annals of Probability, pp.94-116, 1994.

D. Haussler, Sphere packing numbers for subsets of the Boolean n-cube with bounded Vapnik-Chervonenkis dimension, Journal of Combinatorial Theory, Series A, vol.69, issue.2, pp.217-232, 1995.
DOI : 10.1016/0097-3165(95)90052-7

A. L. Samuel, Some studies in machine learning using the game of checkers, IBM Journal on Research and Development, pp.210-229, 1959.

R. E. Bellman and S. E. Dreyfus, Functional approximation and dynamic programming, Math. Tables and other Aids Comp, pp.247-251, 1959.

P. Dimitri, J. Bertsekas, and . Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, 1996.

S. Richard, A. G. Sutton, and . Barto, Reinforcement learning: An introduction, 1998.

G. J. Gordon, Stable Function Approximation in Dynamic Programming, Proceedings of the Twelfth International Conference on Machine Learning, pp.261-268, 1995.
DOI : 10.1016/B978-1-55860-377-6.50040-2

J. N. Tsitsiklis and B. Van-roy, Feature-based methods for large scale dynamic programming, Machine Learning, pp.59-94, 1996.

C. Guestrin, D. Koller, and R. Parr, Max-norm projections for factored mdps, Proceedings of the International Joint Conference on Artificial Intelligence, 2001.

D. Ernst, P. Geurts, and L. Wehenkel, Tree-based batch mode reinforcement learning, Journal of Machine Learning Research, vol.6, pp.503-556, 2005.

X. Wang and T. G. Dietterich, Efficient value function approximation using regression trees, Proceedings of the IJCAI Workshop on Statistical Machine Learning for Large-Scale Optimization, 1999.
DOI : 10.1111/biom.12207

T. G. Dietterich and X. Wang, Batch value function approximation via support vectors, Advances in Neural Information Processing Systems 14, 2002.

. Cs, R. Szepesvári, and . Munos, Finite time bounds for sampling based fitted value iteration, ICML'2005, 2005.

R. Meir, Nonparametric time series prediction through adaptive model selection, Machine Learning, pp.5-34, 2000.