S. Kakade, On the Sample Complexity of Reinforcement Learning, 2003.

S. Kakade and J. Langford, Approximately optimal approximate reinforcement learning, ICML, pp.267-274, 2002.

M. Lagoudakis and R. Parr, Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003.

M. Lagoudakis and R. Parr, Reinforcement Learning as Classification : Leveraging Modern Classifiers, Proceedings of ICML, pp.424-431, 2003.

A. Lazaric, M. Ghavamzadeh, M. , and R. , Analysis of a Classification-based Policy Iteration Algorithm, Proceedings of ICML, pp.607-614, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00482065

R. Munos, Error Bounds for Approximate Policy Iteration, International Conference on Machine Learning (ICML), pp.560-567, 2003.

R. Munos, Performance Bounds in Lp norm for Approximate Value Iteration, SIAM J. Control and Optimization, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00124685

M. Puterman, Markov Decision Processes, 1994.
DOI : 10.1002/9780470316887

B. Scherrer, Performance Bounds for Lambda Policy Iteration and Application to the Game of Tetris, Journal of Machine Learning Research, vol.14, pp.1175-1221, 2013.
URL : https://hal.archives-ouvertes.fr/inria-00185271

B. Scherrer and B. Lesner, On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes, NIPS 2012 -Neural Information Processing Systems, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00758809

B. Scherrer, V. Gabillon, M. Ghavamzadeh, and M. Geist, Approximate Modified Policy Iteration, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00758882