A. Lazaric, M. Restelli, and A. Bonarini, Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods, Proc. of NIPS, p.2007, 2007.

S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvári, X-armed bandits, Journal of Machine Learning Research, vol.12, pp.1655-1695, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00450235

D. Negoescu, P. Frazier, and W. Powell, The Knowledge-Gradient Algorithm for Sequencing Experiments in Drug Discovery, INFORMS Journal on Computing, vol.23, issue.3, pp.346-363, 2011.
DOI : 10.1287/ijoc.1100.0417

T. Dietterich and G. Bakiri, Solving multiclass learning problems via error-correcting output codes, Jo. of Art. Int. Research, vol.2, pp.263-286, 1995.

M. G. Lagoudakis and R. Parr, Reinforcement learning as classification: Leveraging modern classifiers, Proc. of ICML 2003, 2003.

J. L. Bentley, Multidimensional binary search trees used for associative searching, Communications of the ACM, vol.18, issue.9, pp.509-517, 1975.
DOI : 10.1145/361002.361007

A. Lazaric, M. Ghavamzadeh, and R. Munos, Analysis of a classification-based policy iteration algorithm, Proc. of ICML 2010, pp.607-614, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00482065

R. Sutton, Generalization in reinforcement learning: Successful examples using sparse coarse coding, Proc. of NIPS, pp.1038-1044, 1996.

A. Berger, Error-correcting output coding for text classification, Machine Learning for Information Filtering, p.1999, 1999.

C. Dimitrakakis and M. G. Lagoudakis, Rollout sampling approximate policy iteration, Machine Learning, vol.4, issue.1, pp.157-171, 2008.
DOI : 10.1007/s10994-008-5069-3
URL : http://arxiv.org/abs/0805.2027

C. Tham, Modular on-line function approximation for scaling up reinforcement learning, 1994.

G. Tesauro, Practical issues in temporal difference learning, Machine Learning, vol.8, pp.257-277, 1992.

G. Tesauro and G. R. Galperin, On-Line Policy Improvement Using Monte-Carlo Search, Proc. of NIPS, pp.1068-1074, 1997.

J. Pazis and M. G. Lagoudakis, Reinforcement learning in multidimensional continuous action spaces, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp.97-104, 2011.
DOI : 10.1109/ADPRL.2011.5967381

J. Pazis and R. Parr, Generalized Value Functions for Large Action Sets, Proc. of ICML 2011, pp.1185-1192, 2011.

A. Beygelzimer, J. Langford, and B. Zadrozny, Machine Learning Techniques???Reductions Between Prediction Quality Metrics, pp.3-28, 2008.
DOI : 10.1007/978-0-387-79361-0_1
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.115.3348

K. Crammer and Y. Singer, On the Learnability and Design of Output Codes for Multiclass Problems, Machine Learning, vol.47, issue.2/3, pp.201-233, 2002.
DOI : 10.1023/A:1013637720281

M. Cissé, T. Artieres, and P. Gallinari, Learning efficient error correcting output codes for large hierarchical multi-class problems, Workshop on Large-Scale Hierarchical Classification ECML/PKDD 2011, pp.37-49, 2011.