A survey of robot learning from demonstration, Robotics and Autonomous Systems, vol.57, issue.5, pp.469-483, 2009. ,
DOI : 10.1016/j.robot.2008.10.024
Algorithms for inverse reinforcement learning, ICML, pp.663-670, 2000. ,
Generative adversarial imitation learning, NIPS, pp.4565-4573, 2016. ,
Apprenticeship learning via inverse reinforcement learning, Twenty-first international conference on Machine learning , ICML '04, 2004. ,
DOI : 10.1145/1015330.1015430
Apprenticeship learning using linear programming, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.1032-1039, 2008. ,
DOI : 10.1145/1390156.1390286
URL : http://icml2008.cs.helsinki.fi/papers/645.pdf
Maximum entropy inverse reinforcement learning, AAAI, pp.1433-1438, 2008. ,
Learning to search: Functional gradient techniques for imitation learning, Autonomous Robots, vol.50, issue.1, pp.25-53, 2009. ,
DOI : 10.1007/978-3-642-82118-9
URL : http://www.cs.cmu.edu/~ndr/documents/learch.pdf
Model-free imitation learning with policy optimization, ICML Conference Proceedings, pp.2760-2769, 2016. ,
Inverse reinforcement learning through policy gradient minimization, AAAI, 1993. ,
A Cascaded Supervised Learning Approach to Inverse Reinforcement Learning, ECML/PKDD, pp.1-16, 2013. ,
DOI : 10.1007/978-3-642-40988-2_1
URL : https://hal.archives-ouvertes.fr/hal-00869804
Boosted and reward-regularized classification for apprenticeship learning, AAMAS, pp.1249-1256, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01107837
Guided cost learning: Deep inverse optimal control via policy optimization, ICML Conference Proceedings, pp.49-58, 2016. ,
Maximum margin planning, Proceedings of the 23rd international conference on Machine learning , ICML '06, pp.729-736, 2006. ,
DOI : 10.1145/1143844.1143936
URL : http://www-clmc.usc.edu/publications/R/ratliff-ICML2006.pdf
Maximum entropy semi-supervised inverse reinforcement learning, IJCAI, pp.3315-3321, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01146187
Training parsers by inverse reinforcement learning, Machine Learning, pp.303-337, 2009. ,
DOI : 10.1017/CBO9780511546921
URL : https://link.springer.com/content/pdf/10.1007%2Fs10994-009-5110-1.pdf
Feature construction for inverse reinforcement learning, NIPS, pp.1342-1350, 2010. ,
Markov decision processes: Discrete stochastic dynamic programming, 1994. ,
Policy invariance under reward transformations: Theory and application to reward shaping, pp.278-287, 1999. ,
Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2006. ,
Policy gradient methods for reinforcement learning with function approximation, NIPS, pp.1057-1063, 1999. ,
Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine learning, vol.8, issue.3-4, pp.229-256, 1992. ,
Construction of approximation spaces for reinforcement learning, Journal of Machine Learning Research, vol.14, issue.1, pp.2067-2118, 2013. ,
Proto-value functions, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.553-560, 2005. ,
DOI : 10.1145/1102351.1102421
Proto-value functions, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.2169-2231, 2007. ,
DOI : 10.1145/1102351.1102421
Learning representation and control in continuous markov decision processes, AAAI, pp.1194-1199, 2006. ,
A natural policy gradient, NIPS, pp.1531-1538, 2001. ,
A unifying perspective of parametric policy search methods for markov decision processes, Advances in neural information processing systems, pp.2717-2725, 2012. ,
Following Newton direction in Policy Gradient with parameter exploration, 2015 International Joint Conference on Neural Networks (IJCNN), pp.1-8, 2015. ,
DOI : 10.1109/IJCNN.2015.7280673
Multi-objective reinforcement learning through continuous pareto manifold approximation, Journal Artificial Intelligence Research, vol.57, pp.187-227, 2016. ,
Analyzing feature generation for value-function approximation, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.737-744, 2007. ,
DOI : 10.1145/1273496.1273589
URL : http://www.cs.duke.edu/~parr/icml07.pdf
Value pursuit iteration, NIPS, pp.1349-1357, 2012. ,
Inverse KKT ??? Learning Cost Functions of Manipulation Tasks from Demonstrations, Proceedings of the International Symposium of Robotics Research, 2015. ,
DOI : 10.1108/17563781211255862
Hierarchical reinforcement learning with the maxq value function decomposition, J. Artif. Intell. Res.(JAIR), vol.13, pp.227-303, 2000. ,
Linear Quadratic Control: An Introduction, 2000. ,
Tree-based batch mode reinforcement learning, Journal of Machine Learning Research, vol.6, pp.503-556, 2005. ,
Multiple objective decision making-methods and applications: a state-of-the-art survey, 2012. ,
DOI : 10.1007/978-3-642-45511-7
Fundamentals of multiagent systems, 2006. ,
Numerical Optimization of Eigenvalues of Hermitian Matrix Functions, SIAM Journal on Matrix Analysis and Applications, vol.35, issue.2, pp.699-724, 2014. ,
DOI : 10.1137/130933472
Adam: A method for stochastic optimization. CoRR, abs/1412, 2014. ,