A. Fern, S. Yoon, and R. Givan, Approximate policy iteration with a policy language bias: Solving relational Markov decision processes, Journal of Artificial Intelligence Research, vol.25, pp.85-118, 2006.

V. Gabillon, A. Lazaric, M. Ghavamzadeh, and B. Scherrer, Classification-based policy iteration with a critic, Proceedings of the Twenty-Eighth International Conference on Machine Learning, pp.1049-1056, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00590972

R. Howard, Dynamic Programming and Markov Processes, 1960.

S. Kakade and J. Langford, Approximately optimal approximate reinforcement learning, Proceedings of the Nineteenth International Conference on Machine Learning, pp.267-274, 2002.

S. Kakade, On the Sample Complexity of Reinforcement Learning, 2003.

M. Lagoudakis and R. Parr, Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003.

M. Lagoudakis and R. Parr, Reinforcement learning as classification: Leveraging modern classifiers, Proceedings of the Twentieth International Conference on Machine Learning, pp.424-431, 2003.

A. Lazaric, M. Ghavamzadeh, and R. Munos, Analysis of a classification-based policy iteration algorithm, Proceedings of the Twenty-Seventh International Conference on Machine Learning, pp.607-614
URL : https://hal.archives-ouvertes.fr/inria-00482065

A. Lazaric, M. Ghavamzadeh, and R. Munos, Analysis of a classification-based policy iteration algorithm, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00482065