R. Akrour, M. Schoenauer, and M. Sebag, Preference-Based Policy Learning, Proc. Eur. Conf. on Machine Learning and Knowledge Discovery from Databases, pp.12-27, 2011.
DOI : 10.1007/978-3-642-23780-5_11

URL : https://hal.archives-ouvertes.fr/inria-00625001

R. Akrour, M. Schoenauer, M. Sebag, and J. Souplet, Programming by Feedback, Proc. Int. Conf. on Machine Learning (ICML), volume 32 of JMLR Proceedings, pp.1503-1511, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00980839

P. Auer and R. Ortner, Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning, pp.49-56, 2007.

D. P. Bertsekas, Dynamic Programming and Optimal Control, Athena Scientific, 2000.

C. Burges, R. Ragno, and Q. Le, Learning to Rank with Non-Smooth Cost Functions, pp.193-200, 2006.

S. Calinon and A. Billard, Active Teaching in Robot Programming by Demonstration, RO-MAN 2007, The 16th IEEE International Symposium on Robot and Human Interactive Communication, pp.702-707, 2007.
DOI : 10.1109/ROMAN.2007.4415177

M. P. Deisenroth, G. Neumann, and J. Peters, A Survey on Policy Search for Robotics, Foundations and Trends in Robotics, vol.2, issue.1-2, pp.1-142, 2013.
DOI : 10.1561/2300000021

S. Filippi, O. Cappé, and A. Garivier, Optimism in reinforcement learning and Kullback-Leibler divergence, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp.115-122, 2010.
DOI : 10.1109/ALLERTON.2010.5706896

URL : https://hal.archives-ouvertes.fr/hal-00476116

J. Fürnkranz, E. Hüllermeier, W. Cheng, and S. Park, Preference-based reinforcement learning: a formal framework and a policy iteration algorithm, Machine Learning, pp.123-156, 2012.
DOI : 10.1007/s10994-012-5313-8

S. Gelly, L. Kocsis, M. Schoenauer, M. Sebag, D. Silver et al., The grand challenge of computer Go, Communications of the ACM, vol.55, issue.3, pp.106-113, 2012.
DOI : 10.1145/2093548.2093574

URL : https://hal.archives-ouvertes.fr/hal-00695370

S. Gelly and D. Silver, Combining online and offline knowledge in UCT, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.273-280, 2007.
DOI : 10.1145/1273496.1273531

URL : https://hal.archives-ouvertes.fr/inria-00164003

S. Gelly and D. Silver, Combining online and offline knowledge in UCT, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.273-280, 2007.
DOI : 10.1145/1273496.1273531

URL : https://hal.archives-ouvertes.fr/inria-00164003

A. Guez, N. Heess, D. Silver, and P. Dayan, Bayes-Adaptive Simulation-based Search with Value Function Approximation, pp.451-459, 2014.

N. Heess, D. Silver, and Y. W. Teh, Actor-critic reinforcement learning with energy-based policies, Eur. Wshop on Reinforcement Learning JMLR Proceedings, pp.43-58, 2012.

V. Heidrich-meisner and C. Igel, Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, p.51, 2009.
DOI : 10.1145/1553374.1553426

A. Jain, B. Wojcik, T. Joachims, and A. Saxena, Learning Trajectory Preferences for Manipulators via Iterative Improvement, 2013.

T. Joachims, A support vector method for multivariate performance measures, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.377-384, 2005.
DOI : 10.1145/1102351.1102399

W. B. Knox, P. Stone, and C. Breazeal, Training a Robot via Human Feedback: A Case Study, Proc. 5th Intl Conf. on Social Robotics, pp.460-470, 2013.
DOI : 10.1007/978-3-319-02675-6_46

L. Kocsis and C. Szepesvári, Bandit Based Monte-Carlo Planning, Proc. Eur. Conf. on Machine Learning and Knowledge Discovery from Databases (ECML PKDD), pp.282-293, 2006.
DOI : 10.1007/11871842_29

G. Konidaris, S. Kuindersma, A. Barto, and R. Grupen, Constructing skill trees for reinforcement learning agents from demonstration trajectories, pp.1162-1170, 2010.

M. G. Lagoudakis, R. Parr, and L. Bartlett, Least-squares policy iteration, p.2003, 2003.

Y. Lecun, S. Chopra, R. Hadsell, M. Ranzato, and F. Huang, A tutorial on energybased learning, Predicting Structured Data, 2006.

P. Li, C. J. Burges, and Q. Wu, Mcrank: Learning to rank using multiple classification and gradient boosting, Advances in Neural Information Processing Systems, pp.897-904, 2007.

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness et al., Human-level control through deep reinforcement learning, Nature, vol.101, issue.7540, pp.529-533, 2015.
DOI : 10.1038/nature14236

A. Y. Ng and S. Russell, Algorithms for Inverse Reinforcement Learning, Proc. Int. Conf. on Machine Learning (ICML), pp.663-670, 2000.

S. Schaal, A. Ijspeert, and A. Billard, Computational approaches to motor learning by imitation, Philosophical Transactions of the Royal Society B: Biological Sciences, vol.358, issue.1431, pp.537-547, 1431.
DOI : 10.1098/rstb.2002.1258

R. S. Sutton, Generalization in reinforcement learning: Successful examples using sparse coarse coding, pp.1038-1044, 1995.

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

C. Szepesvári, Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, vol.4, issue.1, 2010.
DOI : 10.2200/S00268ED1V01Y201005AIM009

G. Tesauro and T. J. Sejnowski, A parallel network that learns to play backgammon, Artificial Intelligence, vol.39, issue.3, pp.357-390, 1989.
DOI : 10.1016/0004-3702(89)90017-9

A. Wilson, A. Fern, and P. Tadepalli, A Bayesian Approach for Policy Learning from Trajectory Preference Queries, pp.1142-1150, 2012.