P. Abbeel, Apprenticeship Learning and Reinforcement Learning with Application to Robotic Control, 2008.

R. Akrour, M. Schoenauer, and M. Sebag, APRIL: Active Preference Learning-Based Reinforcement Learning, Machine Learning and Knowledge Discovery in Databases (ECML PKDD), pp.116-131, 2012.
DOI : 10.1007/978-3-642-33486-3_8

URL : https://hal.archives-ouvertes.fr/hal-00722744

C. Bergeron, J. Zaretzki, C. M. Breneman, and K. P. Bennett, Multiple instance ranking, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.48-55, 2008.
DOI : 10.1145/1390156.1390163

E. Brochu, N. De-freitas, and A. Ghosh, Active preference learning with discrete choice data, NIPS, 2007.

E. Brochu, T. Brochu, and N. De-freitas, A Bayesian interactive optimization approach to procedural animation design, Symposium on Computer Animation, pp.103-112, 2010.

W. Chu and Z. Ghahramani, Preference learning with Gaussian processes, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.137-144, 2005.
DOI : 10.1145/1102351.1102369

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.134.4878

M. P. Deisenroth, G. Neumann, and J. Peters, A Survey on Policy Search for Robotics, Foundations and Trends in Robotics, vol.2, issue.1-2, pp.1-142, 2013.
DOI : 10.1561/2300000021

C. Dimitrakakis and M. G. Lagoudakis, Rollout sampling approximate policy iteration, Machine Learning, pp.157-171, 2008.
DOI : 10.1007/978-3-540-87479-9_6

URL : http://arxiv.org/abs/0805.2027

J. Fürnkranz, E. Hüllermeier, W. Cheng, and S. Park, Preference-based reinforcement learning: a formal framework and a policy iteration algorithm, Machine Learning, pp.123-156, 2012.
DOI : 10.1007/s10994-012-5313-8

N. Hansen and A. Ostermeier, Completely Derandomized Self-Adaptation in Evolution Strategies, Evolutionary Computation, vol.9, issue.2, pp.159-195, 2001.
DOI : 10.1016/0004-3702(95)00124-7

H. H. Hoos, Programming by optimization, Communications of the ACM, vol.55, issue.2, pp.70-80, 2012.
DOI : 10.1145/2076450.2076469

A. Jain, T. Joachims, and A. Saxena, Learning trajectory preferences for manipulators via iterative improvement, NIPS, 2013.

D. R. Jones, M. Schonlau, W. , and W. J. , Efficient global optimization of expensive black-box functions, Journal of Global Optimization, vol.13, issue.4, pp.455-492, 1998.
DOI : 10.1023/A:1008306431147

W. B. Knox, P. Stone, and C. Breazeal, Training a Robot via Human Feedback: A Case Study, Int. Conf. on Social Robotics, pp.460-470, 2013.
DOI : 10.1007/978-3-319-02675-6_46

G. Konidaris, S. Kuindersma, A. Barto, and R. Grupen, Constructing skill trees for reinforcement learning agents from demonstration trajectories, NIPS 23, pp.1162-1170, 2010.

M. Lagoudakis and R. Parr, Least-squares policy iteration, J. Machine Learning Research, vol.4, pp.1107-1149, 2003.

J. Lin, N. Madnani, and B. Dorr, Putting the user in the loop: Interactive maximal marginal relevance for query-focused summarization, NAACL, pp.305-308, 2010.

D. Lizotte, Practical Bayesian Optimization, 2008.

A. Lörincz, V. Gyenes, M. Kiszlinger, and I. Szita, Mind model seems necessary for the emergence of communication, Neural Information Processing -Letters and Reviews, vol.11, issue.46, pp.109-121, 2007.

R. D. Luce, Individual choice behavior, 1959.

R. Munos, Error bounds for approximate policy iteration, ICML, pp.560-567, 2003.

F. Radlinski, M. Kurup, J. , and T. , How does clickthrough data reflect retrieval quality?, Proceeding of the 17th ACM conference on Information and knowledge mining, CIKM '08, pp.43-52, 2008.
DOI : 10.1145/1458082.1458092

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.147.454

R. N. Shepard, Stimulus and response generalization: A stochastic model relating generalization to distance in psychological space, Psychometrika, vol.3, issue.4, p.325345, 1957.
DOI : 10.1007/BF02288967

P. Shivaswamy and T. Joachims, Online structured prediction via coactive learning, ICML, 2012.

J. Snoek, H. Larochelle, A. , and R. P. , Practical Bayesian optimization of machine learning algorithms, NIPS, pp.2960-2968, 2012.

P. Viappiani and C. Boutilier, Optimal Bayesian recommendation sets and myopically optimal choice query sets, NIPS, pp.2352-2360, 2010.

A. Wilson, A. Fern, and P. Tadepalli, A Bayesian approach for policy learning from trajectory preference queries, NIPS, pp.1142-1150, 2012.

Y. Yue and T. Joachims, Interactively optimizing information retrieval systems as a dueling bandits problem, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553527

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.149.6166