A. Antos, C. Szepesvári, and R. Munos, Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, vol.22, issue.1, pp.89-129, 2008.
DOI : 10.1007/s10994-007-5038-2
URL : https://hal.archives-ouvertes.fr/hal-00830201

B. Avila-pires, M. Ghavamzadeh, and C. Szepesvári, Cost-sensitive multiclass classification risk bounds, Proceedings of the Thirtieth International Conference on Machine Learning, pp.1391-1399, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00840485

J. Bagnell, S. Kakade, A. Ng, and J. Schneider, Policy search by dynamic programming, Proceedings of Advances in Neural Information Processing Systems 16, 2003.

S. Ben-david, N. Cesa-bianchi, D. Haussler, and P. M. Long, }-valued functions, Proceedings of the fifth annual workshop on Computational learning theory , COLT '92, pp.74-86, 1995.
DOI : 10.1145/130385.130423

A. Beygelzimer, V. Dani, T. Hayes, J. Langford, and B. Zadrozny, Error limiting reductions between classification tasks, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.49-56, 2005.
DOI : 10.1145/1102351.1102358
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.81.4721

A. Beygelzimer, J. Langford, and P. Ravikumar, Error-Correcting Tournaments, Proceedings of the 20th International Conference on Algorithmic Learning Theory, pp.247-262, 2009.
DOI : 10.1137/0214009
URL : http://arxiv.org/abs/0902.3176

S. Bradtke and A. Barto, Linear least-squares algorithms for temporal difference learning, Journal of Machine Learning, vol.22, pp.33-57, 1996.
DOI : 10.1007/978-0-585-33656-5_4
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.143.857

R. Busa-fekete and B. Kégl, Fast boosting using adversarial bandits, Proceedings of the Twenty-Seventh International Conference on Machine Learning, pp.49-56, 2010.
URL : https://hal.archives-ouvertes.fr/in2p3-00614564

L. Devroye, L. Györfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition, 1996.
DOI : 10.1007/978-1-4612-0711-5

C. Dimitrakakis and M. Lagoudakis, Algorithms and bounds for sampling-based approximate policy iteration, Recent Advances in Reinforcement Learning, 2008.

C. Dimitrakakis and M. Lagoudakis, Rollout sampling approximate policy iteration, Machine Learning, vol.4, issue.1, pp.157-171, 2008.
DOI : 10.1007/s10994-008-5069-3
URL : http://arxiv.org/abs/0805.2027

A. M. Farahmand, R. Munos, and C. Szepesvári, Error propagation for approximate policy and value iteration, Advances in Neural Information Processing Systems, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00830154

A. M. Farahmand, D. Precup, A. Barreto, and M. Ghavamzadeh, CAPI: Generalized classification-based approximate policy iteration, Proceedings of the Multidisciplinary Conference on Reinforcement Learning and Decision Making, 2013.
DOI : 10.1109/tac.2015.2418411

A. Fern, S. Yoon, and R. Givan, Approximate policy iteration with a policy language bias, Proceedings of Advances in Neural Information Processing Systems 16, 2004.

A. Fern, S. Yoon, and R. Givan, Approximate policy iteration with a policy language bias: Solving relational Markov decision processes, Journal of Artificial Intelligence Research, vol.25, pp.85-118, 2006.

V. Gabillon, A. Lazaric, and M. Ghavamzadeh, Rollout allocation strategies for classification-based policy iteration, ICML Workshop on Reinforcement Learning and Search in Very Large Spaces, 2010.

V. Gabillon, A. Lazaric, M. Ghavamzadeh, and B. Scherrer, Classification-based policy iteration with a critic, Proceedings of the Twenty-Eighth International Conference on Machine Learning, pp.1049-1056, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00590972

V. Gabillon, M. Ghavamzadeh, and B. Scherrer, Approximate dynamic programming finally performs well in the game of Tetris, Proceedings of Advances in Neural Information Processing Systems 26, pp.1754-1762, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00921250

M. Ghavamzadeh and A. Lazaric, Conservative and greedy approaches to classificationbased policy iteration, Proceedings of the Twenty-Sixth Conference on Artificial Intelligence, pp.914-920, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00772610

L. Györfi, M. Kohler, A. Krzyzak, and H. Walk, A distribution-free theory of nonparametric regression, 2002.
DOI : 10.1007/b97848

D. Haussler, Sphere packing numbers for subsets of the Boolean n-cube with bounded Vapnik-Chervonenkis dimension, Journal of Combinatorial Theory, Series A, vol.69, issue.2, pp.217-232, 1995.
DOI : 10.1016/0097-3165(95)90052-7

R. Howard, Dynamic Programming and Markov Processes, 1960.

S. Kakade, On the Sample Complexity of Reinforcement Learning, 2003.

S. Kakade and J. Langford, Approximately optimal approximate reinforcement learning, Proceedings of the Nineteenth International Conference on Machine Learning, pp.267-274, 2002.

M. Lagoudakis and R. Parr, Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003.

M. Lagoudakis and R. Parr, Reinforcement learning as classification: Leveraging modern classifiers, Proceedings of the Twentieth International Conference on Machine Learning, pp.424-431, 2003.

J. Langford and B. Zadrozny, Relating reinforcement learning performance to classification performance, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.473-480, 2005.
DOI : 10.1145/1102351.1102411
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.408.1329

A. Lazaric, M. Ghavamzadeh, and R. Munos, Analysis of a classification-based policy iteration algorithm, Proceedings of the Twenty-Seventh International Conference on Machine Learning, pp.607-614, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00482065

A. Lazaric, M. Ghavamzadeh, and R. Munos, Finite-sample analysis of least-squares policy iteration, Journal of Machine Learning Research, vol.13, pp.3041-3074, 2012.
URL : https://hal.archives-ouvertes.fr/inria-00528596

L. Li, V. Bulitko, and R. Greiner, Focus of attention in reinforcement learning, Journal of Universal Computer Science, vol.13, issue.9, pp.1246-1269, 2007.

A. Lozano and N. Abe, Multi-class cost-sensitive boosting with p-norm loss functions, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD 08, pp.506-514, 2008.
DOI : 10.1145/1401890.1401953

P. Mineiro, Error and regret bounds for cost-sensitive multi- class classification reduction to regression, 2010.

R. Munos, Performance bounds in Lp norm for approximate value iteration, SIAM Journal of Control and Optimization, 2007.
DOI : 10.1137/040614384
URL : https://hal.archives-ouvertes.fr/inria-00124685

R. Munos and C. Szepesvári, Finite time bounds for fitted value iteration, Journal of Machine Learning Research, vol.9, pp.815-857, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00120882

D. Pollard, Convergence of Stochastic Processes, 1984.
DOI : 10.1007/978-1-4612-5254-2

B. Scherrer, M. Ghavamzadeh, V. Gabillon, and M. Geist, Approximate modified policy iteration, Proceedings of the Twenty-Ninth International Conference on Machine Learning, pp.1207-1214, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00758882