A. Antos, C. Szepesvári, and R. Munos, Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, vol.22, issue.1, pp.89-129, 2008.
DOI : 10.1007/s10994-007-5038-2
URL : https://hal.archives-ouvertes.fr/hal-00830201

J. Bagnell, S. Kakade, A. Ng, and J. Schneider, Policy search by dynamic programming, Proceedings of Advances in Neural Information Processing Systems 16, 2003.

S. Ben-david, N. Cesa-bianchi, D. Haussler, and P. M. Long, }-valued functions, Proceedings of the fifth annual workshop on Computational learning theory , COLT '92, pp.74-86, 1995.
DOI : 10.1145/130385.130423

A. Beygelzimer, V. Dani, T. Hayes, J. Langford, and B. Zadrozny, Error limiting reductions between classification tasks, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.49-56, 2005.
DOI : 10.1145/1102351.1102358
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.81.4721

A. Beygelzimer, J. Langford, and P. Ravikumar, Error-correcting tournaments. CoRR, abs/0902, p.3176, 2009.
DOI : 10.1007/978-3-642-04414-4_22
URL : http://arxiv.org/abs/0902.3176

S. Bradtke and A. Barto, Linear least-squares algorithms for temporal difference learning, Journal of Machine Learning, vol.22, pp.33-57, 1996.

R. Busa-fekete and B. Kégl, Fast boosting using adversarial bandits, Proceedings of the Twenty-Seventh International Conference on Machine Learning, pp.49-56, 2010.
URL : https://hal.archives-ouvertes.fr/in2p3-00614564

L. Devroye, L. Györfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition, 1996.
DOI : 10.1007/978-1-4612-0711-5

C. Dimitrakakis and M. Lagoudakis, Algorithms and bounds for sampling-based approximate policy iteration, Recent Advances in Reinforcement Learning, 2008.

C. Dimitrakakis and M. Lagoudakis, Rollout sampling approximate policy iteration, Machine Learning, vol.4, issue.1, pp.157-171, 2008.
DOI : 10.1007/s10994-008-5069-3
URL : http://arxiv.org/abs/0805.2027

A. M. Farahmand, R. Munos, and C. Szepesvári, Error propagation for approximate policy and value iteration, Advances in Neural Information Processing Systems, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00830154

A. Fern, S. Yoon, and R. Givan, Approximate policy iteration with a policy language bias, Proceedings of Advances in Neural Information Processing Systems 16, 2004.

A. Fern, S. Yoon, and R. Givan, Approximate policy iteration with a policy language bias: Solving relational Markov decision processes, Journal of Artificial Intelligence Research, vol.25, pp.85-118, 2006.

V. Gabillon, A. Lazaric, and M. Ghavamzadeh, Rollout allocation strategies for classification-based policy iteration, ICML 2010 Workshop on Reinforcement Learning and Search in Very Large Spaces, 2010.

R. A. Howard, Dynamic Programming and Markov Processes, 1960.

M. Lagoudakis and R. Parr, Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003.

M. Lagoudakis and R. Parr, Reinforcement learning as classification: Leveraging modern classifiers, Proceedings of the Twentieth International Conference on Machine Learning, pp.424-431, 2003.

J. Langford and B. Zadrozny, Relating reinforcement learning performance to classification performance, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.473-480, 2005.
DOI : 10.1145/1102351.1102411

L. Li, V. Bulitko, and R. Greiner, Focus of attention in reinforcement learning, Journal of Universal Computer Science, vol.13, issue.9, pp.1246-1269, 2007.

A. Lozano and N. Abe, Multi-class cost-sensitive boosting with p-norm loss functions, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD 08, pp.506-514, 2008.
DOI : 10.1145/1401890.1401953

R. Munos, Performance bounds in Lp norm for approximate value iteration, SIAM Journal of Control and Optimization, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00124685

R. Munos and C. Szepesvári, Finite time bounds for fitted value iteration, Journal of Machine Learning Research, vol.9, pp.815-857, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00120882

D. Pollard, Convergence of Stochastic Processes, 1984.
DOI : 10.1007/978-1-4612-5254-2

H. Tu and H. Lin, One-sided support vector regression for multiclass cost-sensitive classification, Proceedings of the Twenty-Seventh International Conference on Machine learning, pp.49-56, 2010.

B. Zadrozny, J. Langford, and N. Abe, Cost-sensitive learning by cost-proportionate example weighting, Third IEEE International Conference on Data Mining, p.435, 2003.
DOI : 10.1109/ICDM.2003.1250950
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.108.9874