Apprenticeship learning via inverse reinforcement learning, Twenty-first international conference on Machine learning , ICML '04, 2004. ,
DOI : 10.1145/1015330.1015430
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.92
Preference-Based Policy Learning, pp.12-27, 2011. ,
DOI : 10.1007/978-3-642-23780-5_11
URL : https://hal.archives-ouvertes.fr/inria-00625001
APRIL: Active Preference Learning-Based Reinforcement Learning, pp.116-131 ,
DOI : 10.1007/978-3-642-33486-3_8
URL : https://hal.archives-ouvertes.fr/hal-00722744
Multiple instance ranking, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.48-55, 2008. ,
DOI : 10.1145/1390156.1390163
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.149.8314
Neuro-Dynamic Programming, Athena Scientific, 1996. ,
DOI : 10.1007/0-306-48332-7_333
On Learning, Representing, and Generalizing a Task in a Humanoid Robot, IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), vol.37, issue.2, pp.286-298, 2007. ,
DOI : 10.1109/TSMCB.2006.886952
Preference-Based Policy Iteration: Leveraging Preference Learning for Reinforcement Learning, pp.312-327, 2011. ,
DOI : 10.1007/978-3-642-23780-5_30
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.224.8007
Preference learning with Gaussian processes, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.137-144, 2005. ,
DOI : 10.1145/1102351.1102369
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.134.4878
Rollout sampling approximate policy iteration, Machine Learning, vol.4, issue.1, pp.157-171, 2008. ,
DOI : 10.1007/s10994-008-5069-3
URL : http://arxiv.org/abs/0805.2027
Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, p.51, 2009. ,
DOI : 10.1145/1553374.1553426
Inverse reinforcement learning through structured classification, pp.1016-1024, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00778624
Reinforcement Learning in Robotics: A Survey, pp.579-610, 2012. ,
Hierarchical apprenticeship learning with application to quadruped locomotion, 2007. ,
Constructing skill trees for reinforcement learning agents from demonstration trajectories, NIPS 23, pp.1162-1170, 2010. ,
Least-squares policy iteration, Journal of Machine Learning Research (JMLR), vol.4, pp.1107-1149, 2003. ,
Individual choice behavior, 1959. ,
Algorithms for inverse reinforcement learning, pp.663-670, 2000. ,
Stimulus and response generalization: A stochastic model relating generalization to distance in psychological space, Psychometrika, vol.3, issue.4, p.325345, 1957. ,
DOI : 10.1007/BF02288967
Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998. ,
DOI : 10.1109/TNN.1998.712192
Fast gradient-descent methods for temporal-difference learning with linear function approximation, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, p.ICML, 2009. ,
DOI : 10.1145/1553374.1553501
Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, vol.4, issue.1, 2010. ,
DOI : 10.2200/S00268ED1V01Y201005AIM009
Optimal Bayesian recommendation sets and myopically optimal choice query sets, pp.2352-2360, 2010. ,
A bayesian approach for policy learning from trajectory preference queries, pp.1142-1150, 2012. ,