P. Abbeel and A. Ng, Apprenticeship learning via inverse reinforcement learning, Twenty-first international conference on Machine learning , ICML '04, 2004.
DOI : 10.1145/1015330.1015430
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.92

R. Akrour, M. Schoenauer, and M. Sebag, Preference-Based Policy Learning, pp.12-27, 2011.
DOI : 10.1007/978-3-642-23780-5_11
URL : https://hal.archives-ouvertes.fr/inria-00625001

R. Akrour, M. Schoenauer, and M. Sebag, APRIL: Active Preference Learning-Based Reinforcement Learning, pp.116-131
DOI : 10.1007/978-3-642-33486-3_8
URL : https://hal.archives-ouvertes.fr/hal-00722744

C. Bergeron, J. Zaretzki, C. M. Breneman, and K. P. Bennett, Multiple instance ranking, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.48-55, 2008.
DOI : 10.1145/1390156.1390163
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.149.8314

D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, 1996.
DOI : 10.1007/0-306-48332-7_333

S. Calinon, F. Guenter, and A. Billard, On Learning, Representing, and Generalizing a Task in a Humanoid Robot, IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), vol.37, issue.2, pp.286-298, 2007.
DOI : 10.1109/TSMCB.2006.886952

W. Cheng, J. Fürnkranz, E. Hüllermeier, and S. H. Park, Preference-Based Policy Iteration: Leveraging Preference Learning for Reinforcement Learning, pp.312-327, 2011.
DOI : 10.1007/978-3-642-23780-5_30
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.224.8007

W. Chu and Z. Ghahramani, Preference learning with Gaussian processes, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.137-144, 2005.
DOI : 10.1145/1102351.1102369
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.134.4878

C. Dimitrakakis and M. G. Lagoudakis, Rollout sampling approximate policy iteration, Machine Learning, vol.4, issue.1, pp.157-171, 2008.
DOI : 10.1007/s10994-008-5069-3
URL : http://arxiv.org/abs/0805.2027

V. Heidrich-meisner and C. Igel, Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, p.51, 2009.
DOI : 10.1145/1553374.1553426

E. Klein, M. Geist, B. Piot, and O. Pietquin, Inverse reinforcement learning through structured classification, pp.1016-1024, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00778624

J. Kober and J. Peters, Reinforcement Learning in Robotics: A Survey, pp.579-610, 2012.

J. Z. Kolter, P. Abbeel, and A. Y. Ng, Hierarchical apprenticeship learning with application to quadruped locomotion, 2007.

G. Konidaris, S. Kuindersma, A. Barto, and R. Grupen, Constructing skill trees for reinforcement learning agents from demonstration trajectories, NIPS 23, pp.1162-1170, 2010.

M. Lagoudakis and R. Parr, Least-squares policy iteration, Journal of Machine Learning Research (JMLR), vol.4, pp.1107-1149, 2003.

R. D. Luce, Individual choice behavior, 1959.

A. Ng and S. Russell, Algorithms for inverse reinforcement learning, pp.663-670, 2000.

R. N. Shepard, Stimulus and response generalization: A stochastic model relating generalization to distance in psychological space, Psychometrika, vol.3, issue.4, p.325345, 1957.
DOI : 10.1007/BF02288967

R. Sutton and A. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

R. S. Sutton, H. R. Maei, D. Precup, S. Bhatnagar, D. Silver et al., Fast gradient-descent methods for temporal-difference learning with linear function approximation, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, p.ICML, 2009.
DOI : 10.1145/1553374.1553501

C. Szepesvári, Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, vol.4, issue.1, 2010.
DOI : 10.2200/S00268ED1V01Y201005AIM009

P. Viappiani and C. Boutilier, Optimal Bayesian recommendation sets and myopically optimal choice query sets, pp.2352-2360, 2010.

A. Wilson, A. Fern, and P. Tadepalli, A bayesian approach for policy learning from trajectory preference queries, pp.1142-1150, 2012.