Preference-Based Policy Learning, Proc. Eur. Conf. on Machine Learning and Knowledge Discovery from Databases, pp.12-27, 2011. ,
DOI : 10.1007/978-3-642-23780-5_11
URL : https://hal.archives-ouvertes.fr/inria-00625001
Programming by Feedback, Proc. Int. Conf. on Machine Learning (ICML), volume 32 of JMLR Proceedings, pp.1503-1511, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00980839
Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning, pp.49-56, 2007. ,
Dynamic Programming and Optimal Control, Athena Scientific, 2000. ,
Learning to Rank with Non-Smooth Cost Functions, pp.193-200, 2006. ,
Active Teaching in Robot Programming by Demonstration, RO-MAN 2007, The 16th IEEE International Symposium on Robot and Human Interactive Communication, pp.702-707, 2007. ,
DOI : 10.1109/ROMAN.2007.4415177
A Survey on Policy Search for Robotics, Foundations and Trends in Robotics, vol.2, issue.1-2, pp.1-142, 2013. ,
DOI : 10.1561/2300000021
Optimism in reinforcement learning and Kullback-Leibler divergence, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp.115-122, 2010. ,
DOI : 10.1109/ALLERTON.2010.5706896
URL : https://hal.archives-ouvertes.fr/hal-00476116
Preference-based reinforcement learning: a formal framework and a policy iteration algorithm, Machine Learning, pp.123-156, 2012. ,
DOI : 10.1007/s10994-012-5313-8
The grand challenge of computer Go, Communications of the ACM, vol.55, issue.3, pp.106-113, 2012. ,
DOI : 10.1145/2093548.2093574
URL : https://hal.archives-ouvertes.fr/hal-00695370
Combining online and offline knowledge in UCT, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.273-280, 2007. ,
DOI : 10.1145/1273496.1273531
URL : https://hal.archives-ouvertes.fr/inria-00164003
Combining online and offline knowledge in UCT, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.273-280, 2007. ,
DOI : 10.1145/1273496.1273531
URL : https://hal.archives-ouvertes.fr/inria-00164003
Bayes-Adaptive Simulation-based Search with Value Function Approximation, pp.451-459, 2014. ,
Actor-critic reinforcement learning with energy-based policies, Eur. Wshop on Reinforcement Learning JMLR Proceedings, pp.43-58, 2012. ,
Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, p.51, 2009. ,
DOI : 10.1145/1553374.1553426
Learning Trajectory Preferences for Manipulators via Iterative Improvement, 2013. ,
A support vector method for multivariate performance measures, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.377-384, 2005. ,
DOI : 10.1145/1102351.1102399
Training a Robot via Human Feedback: A Case Study, Proc. 5th Intl Conf. on Social Robotics, pp.460-470, 2013. ,
DOI : 10.1007/978-3-319-02675-6_46
Bandit Based Monte-Carlo Planning, Proc. Eur. Conf. on Machine Learning and Knowledge Discovery from Databases (ECML PKDD), pp.282-293, 2006. ,
DOI : 10.1007/11871842_29
Constructing skill trees for reinforcement learning agents from demonstration trajectories, pp.1162-1170, 2010. ,
Least-squares policy iteration, p.2003, 2003. ,
A tutorial on energybased learning, Predicting Structured Data, 2006. ,
Mcrank: Learning to rank using multiple classification and gradient boosting, Advances in Neural Information Processing Systems, pp.897-904, 2007. ,
Human-level control through deep reinforcement learning, Nature, vol.101, issue.7540, pp.529-533, 2015. ,
DOI : 10.1038/nature14236
Algorithms for Inverse Reinforcement Learning, Proc. Int. Conf. on Machine Learning (ICML), pp.663-670, 2000. ,
Computational approaches to motor learning by imitation, Philosophical Transactions of the Royal Society B: Biological Sciences, vol.358, issue.1431, pp.537-547, 1431. ,
DOI : 10.1098/rstb.2002.1258
Generalization in reinforcement learning: Successful examples using sparse coarse coding, pp.1038-1044, 1995. ,
Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998. ,
DOI : 10.1109/TNN.1998.712192
Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, vol.4, issue.1, 2010. ,
DOI : 10.2200/S00268ED1V01Y201005AIM009
A parallel network that learns to play backgammon, Artificial Intelligence, vol.39, issue.3, pp.357-390, 1989. ,
DOI : 10.1016/0004-3702(89)90017-9
A Bayesian Approach for Policy Learning from Trajectory Preference Queries, pp.1142-1150, 2012. ,