A. Antos, C. Szepesvári, M. , and R. , Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, pp.89-129, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00830201

T. Archibald, K. Mckinnon, T. , and L. , On the Generation of Markov Decision Processes, Journal of the Operational Research Society, vol.46, issue.3, pp.354-361, 1995.
DOI : 10.1057/jors.1995.50

J. Bagnell, S. Kakade, A. Ng, and J. Schneider, Policy search by dynamic programming, Neural Information Processing Systems, 2003.

D. Bertsekas and J. Tsitsiklis, Neuro-Dynamic Programming, 1996.

A. Farahmand, M. Ghavamzadeh, C. Szepesvári, and S. Mannor, Regularized policy iteration, Advances in Neural Information Processing Systems, pp.441-448, 2009.

A. Farahmand, R. Munos, and C. Szepesvári, Error propagation for approximate policy and value iteration (extended version), NIPS, 2010.

A. Fern, S. Yoon, and R. Givan, Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes, Journal of Artificial Intelligence Research, vol.25, pp.75-118, 2006.

I. , H. D. Langford, J. Marcu, and D. , Search-based structured prediction, Machine Learning, pp.297-325, 2009.

S. Kakade, On the Sample Complexity of Reinforcement Learning, 2003.

S. Kakade and J. Langford, Approximately optimal approximate reinforcement learning, ICML, pp.267-274, 2002.

M. Lagoudakis and R. Parr, Reinforcement Learning as Classification: Leveraging Modern Classifiers, Proceedings of ICML, pp.424-431, 2003.

A. Lazaric, M. Ghavamzadeh, M. , and R. , Analysis of a Classification-based Policy Iteration Algorithm, Proceedings of ICML, pp.607-614, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00482065

A. Lazaric, M. Ghavamzadeh, M. , and R. , Finite-Sample Analysis of Least-Squares Policy Iteration, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00528596

R. Munos, Error Bounds for Approximate Policy Iteration, International Conference on Machine Learning (ICML), pp.560-567, 2003.

R. Munos, Performance Bounds in Lp norm for Approximate Value Iteration, SIAM J. Control and Optimization, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00124685

R. Munos and C. Szepesvári, Finite time bounds for sampling based fitted value iteration, Journal of Machine Learning Research (JMLR), vol.9, pp.815-857, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00120882

M. Puterman, Markov Decision Processes, 1994.
DOI : 10.1002/9780470316887

B. Scherrer, Performance Bounds for Lambda Policy Iteration and Application to the Game of Tetris, Journal of Machine Learning Research, vol.14, pp.1175-1221, 2013.
URL : https://hal.archives-ouvertes.fr/inria-00185271

B. Scherrer and B. Lesner, On the Use of Non-Stationary Policies for Stationary Infinite- Horizon Markov Decision Processes, NIPS 2012 -Neural Information Processing Systems, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00758809

B. Scherrer, V. Gabillon, M. Ghavamzadeh, and M. Geist, Approximate Modified Policy Iteration, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00758882

C. Szepesvári, Reinforcement Learning Algorithms for MDPs, 2010.
DOI : 10.1002/9780470400531.eorms0714