P. Abbeel and A. Y. Ng, Apprenticeship learning via inverse reinforcement learning, Twenty-first international conference on Machine learning , ICML '04, 2004.
DOI : 10.1145/1015330.1015430

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

R. Akrour, M. Schoenauer, and M. Sebag, Preference-Based Policy Learning, Gunopulos et al. [12], pp.12-27
DOI : 10.1007/978-3-642-23780-5_11

URL : https://hal.archives-ouvertes.fr/inria-00625001

R. Akrour, M. Schoenauer, and M. Sebag, APRIL: Active Preference Learning-Based Reinforcement Learning, ECML/PKDD, p. to appear, 2012.
DOI : 10.1007/978-3-642-33486-3_8

URL : https://hal.archives-ouvertes.fr/hal-00722744

H. Bay, T. Tuytelaars, and L. J. Van-gool, SURF: Speeded Up Robust Features, Computer Vision -ECCV 2006, pp.404-417, 2006.
DOI : 10.1007/11744023_32

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

S. Bickel, C. Sawade, and T. Scheffer, Transfer learning by distribution matching for targeted advertising, Advances in Neural Information Processing Systems, NIPS 21, pp.145-152, 2008.

E. Bonabeau, M. Dorigo, and G. Theraulaz, Swarm Intelligence: From Natural to Artificial Systems, Santa Fe Institute Studies in the Sciences of Complexity, 1999.

H. Lipson, J. Bongard, and V. Zykov, Resilient machines through continuous self-modeling, Science, issue.5802, pp.314-1118, 2006.

E. Brochu, N. De-freitas, and A. Ghosh, Active preference learning with discrete choice data, Advances in Neural Information Processing Systems, pp.409-416, 2008.

S. Calinon, F. Guenter, and A. Billard, On Learning, Representing, and Generalizing a Task in a Humanoid Robot, IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), vol.37, issue.2, pp.286-298, 2007.
DOI : 10.1109/TSMCB.2006.886952

W. Cheng, J. Fürnkranz, E. Hüllermeier, and S. Park, Preference-Based Policy Iteration: Leveraging Preference Learning for Reinforcement Learning, Gunopulos et al. [12], pp.312-327
DOI : 10.1007/978-3-642-23780-5_30

T. Joachims, A support vector method for multivariate performance measures, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.377-384, 2005.
DOI : 10.1145/1102351.1102399

T. Joachims, Training linear SVMs in linear time, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '06, pp.217-226, 2006.
DOI : 10.1145/1150402.1150429

J. Z. Kolter, P. Abbeel, and A. Y. Ng, Hierarchical apprenticeship learning with application to quadruped locomotion', in NIPS, 2007.

G. Konidaris, S. Kuindersma, A. Barto, and R. Grupen, Constructing skill trees for reinforcement learning agents from demonstration trajectories, Advances in Neural Information Processing Systems 23, pp.1162-1170, 2010.

M. Lagoudakis and R. Parr, Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003.

A. Y. Ng and S. Russell, Algorithms for inverse reinforcement learning, Proc. of the Seventeenth International Conference on Machine Learning (ICML-00), pp.663-670, 2000.

C. Szepesvári, Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, vol.4, issue.1, 2010.
DOI : 10.2200/S00268ED1V01Y201005AIM009

G. Tesauro, Programming backgammon using self-teaching neural nets, Artificial Intelligence, vol.134, issue.1-2, pp.181-199, 2002.
DOI : 10.1016/S0004-3702(01)00110-2

URL : http://doi.org/10.1016/s0004-3702(01)00110-2

P. Viappiani and C. Boutilier, Optimal Bayesian recommendation sets and myopically optimal choice query sets, NIPS, pp.2352-2360, 2010.