B. A. and L. J. Zadrozny-b, Machine learning techniques?reductions between prediction quality metrics. Performance Modeling and Engineering, pp.3-28, 2008.

B. S. Munos-r, . Stoltz-g, and . C. Szepesv´ariszepesv´, X-armed bandits, Journal of Machine Learning Research, vol.12, pp.1655-1695, 2011.

C. M. and A. T. Gallinari-p, Learning efficient error correcting output codes for large hierarchical multi-class problems, Workshop on Large-Scale Hierarchical Classification ECML, pp.37-49, 2011.

C. K. Singer-y, On the Learnability and Design of Output Codes for Multiclass Problems, Machine Learning, pp.201-233, 2002.

D. T. Bakiri-g, Solving multiclass learning problems via error-correcting output codes, Jo. of Art. Int. Research, vol.2, pp.263-286, 1995.

D. C. Lagoudakis-m, Rollout sampling approximate policy iteration, Machine Learning, pp.157-171, 2008.

L. M. Parr-r, Reinforcement learning as classification: Leveraging modern classifiers, Proc. of ICML '03, 2003.

L. A. and G. M. Munos-r, Analysis of a classification-based policy iteration algorithm, Proc. of ICML '10, pp.607-614, 2010.

L. A. and R. M. Bonarini-a, Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods, Proc. of NIPS '07, 2007.

N. D. and F. P. Powell-w, The knowledge-gradient algorithm for sequencing experiments in drug discovery, INFORMS J. on Computing, vol.23, issue.3, pp.346-363, 2011.

P. J. Lagoudakis-m, Reinforcement Learning in Multidimensional Continuous Action Spaces, Proc. of Adaptive Dynamic Programming and Reinf. Learn, pp.97-104, 2011.

P. J. Parr-r, Generalized Value Functions for Large Action Sets, Proc. of ICML '11, pp.1185-1192, 2011.

T. G. Galperin-g, On-Line Policy Improvement Using Monte-Carlo Search, Proc. of NIPS '97, pp.1068-1074, 1997.