B. D. Argall, S. Chernova, M. Veloso, and B. Browning, A survey of robot learning from demonstration, Robotics and Autonomous Systems, vol.57, issue.5, pp.469-483, 2009.
DOI : 10.1016/j.robot.2008.10.024

Y. Andrew, . Ng, J. Stuart, and . Russell, Algorithms for inverse reinforcement learning, ICML, pp.663-670, 2000.

J. Ho and S. Ermon, Generative adversarial imitation learning, NIPS, pp.4565-4573, 2016.

P. Abbeel, Y. Andrew, and . Ng, Apprenticeship learning via inverse reinforcement learning, Twenty-first international conference on Machine learning , ICML '04, 2004.
DOI : 10.1145/1015330.1015430

U. Syed, M. H. Bowling, and R. E. Schapire, Apprenticeship learning using linear programming, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.1032-1039, 2008.
DOI : 10.1145/1390156.1390286
URL : http://icml2008.cs.helsinki.fi/papers/645.pdf

D. Brian, . Ziebart, L. Andrew, A. Maas, . Bagnell et al., Maximum entropy inverse reinforcement learning, AAAI, pp.1433-1438, 2008.

N. D. Ratliff, D. Silver, and J. A. Bagnell, Learning to search: Functional gradient techniques for imitation learning, Autonomous Robots, vol.50, issue.1, pp.25-53, 2009.
DOI : 10.1007/978-3-642-82118-9
URL : http://www.cs.cmu.edu/~ndr/documents/learch.pdf

J. Ho, J. K. Gupta, and S. Ermon, Model-free imitation learning with policy optimization, ICML Conference Proceedings, pp.2760-2769, 2016.

M. Pirotta and M. Restelli, Inverse reinforcement learning through policy gradient minimization, AAAI, 1993.

E. Klein, B. Piot, M. Geist, and O. Pietquin, A Cascaded Supervised Learning Approach to Inverse Reinforcement Learning, ECML/PKDD, pp.1-16, 2013.
DOI : 10.1007/978-3-642-40988-2_1
URL : https://hal.archives-ouvertes.fr/hal-00869804

B. Piot, M. Geist, and O. Pietquin, Boosted and reward-regularized classification for apprenticeship learning, AAMAS, pp.1249-1256, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01107837

C. Finn, S. Levine, and P. Abbeel, Guided cost learning: Deep inverse optimal control via policy optimization, ICML Conference Proceedings, pp.49-58, 2016.

N. D. Ratliff, J. A. Bagnell, and M. Zinkevich, Maximum margin planning, Proceedings of the 23rd international conference on Machine learning , ICML '06, pp.729-736, 2006.
DOI : 10.1145/1143844.1143936
URL : http://www-clmc.usc.edu/publications/R/ratliff-ICML2006.pdf

J. Audiffren, M. Valko, A. Lazaric, and M. Ghavamzadeh, Maximum entropy semi-supervised inverse reinforcement learning, IJCAI, pp.3315-3321, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01146187

G. Neu and C. Szepesvári, Training parsers by inverse reinforcement learning, Machine Learning, pp.303-337, 2009.
DOI : 10.1017/CBO9780511546921
URL : https://link.springer.com/content/pdf/10.1007%2Fs10994-009-5110-1.pdf

S. Levine, Z. Popovic, and V. Koltun, Feature construction for inverse reinforcement learning, NIPS, pp.1342-1350, 2010.

L. Martin and . Puterman, Markov decision processes: Discrete stochastic dynamic programming, 1994.

Y. Andrew, D. Ng, S. Harada, and . Russell, Policy invariance under reward transformations: Theory and application to reward shaping, pp.278-287, 1999.

J. Nocedal and S. J. Wright, Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2006.

R. S. Sutton, D. A. Mcallester, S. P. Singh, and Y. Mansour, Policy gradient methods for reinforcement learning with function approximation, NIPS, pp.1057-1063, 1999.

J. Ronald and . Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine learning, vol.8, issue.3-4, pp.229-256, 1992.

W. Böhmer, S. Grünewälder, Y. Shen, M. Musial, and K. Obermayer, Construction of approximation spaces for reinforcement learning, Journal of Machine Learning Research, vol.14, issue.1, pp.2067-2118, 2013.

S. Mahadevan, Proto-value functions, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.553-560, 2005.
DOI : 10.1145/1102351.1102421

S. Mahadevan and M. Maggioni, Proto-value functions, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.2169-2231, 2007.
DOI : 10.1145/1102351.1102421

M. Sridhar-mahadevan, K. Maggioni, S. Ferguson, and . Osentoski, Learning representation and control in continuous markov decision processes, AAAI, pp.1194-1199, 2006.

S. Kakade, A natural policy gradient, NIPS, pp.1531-1538, 2001.

T. Furmston and D. Barber, A unifying perspective of parametric policy search methods for markov decision processes, Advances in neural information processing systems, pp.2717-2725, 2012.

G. Manganini, M. Pirotta, M. Restelli, and L. Bascetta, Following Newton direction in Policy Gradient with parameter exploration, 2015 International Joint Conference on Neural Networks (IJCNN), pp.1-8, 2015.
DOI : 10.1109/IJCNN.2015.7280673

S. Parisi, M. Pirotta, and M. Restelli, Multi-objective reinforcement learning through continuous pareto manifold approximation, Journal Artificial Intelligence Research, vol.57, pp.187-227, 2016.

R. Parr, C. Painter-wakefield, L. Li, and M. L. Littman, Analyzing feature generation for value-function approximation, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.737-744, 2007.
DOI : 10.1145/1273496.1273589
URL : http://www.cs.duke.edu/~parr/icml07.pdf

A. Massoud, F. , and D. Precup, Value pursuit iteration, NIPS, pp.1349-1357, 2012.

P. Englert and M. Toussaint, Inverse KKT ??? Learning Cost Functions of Manipulation Tasks from Demonstrations, Proceedings of the International Symposium of Robotics Research, 2015.
DOI : 10.1108/17563781211255862

G. Thomas and . Dietterich, Hierarchical reinforcement learning with the maxq value function decomposition, J. Artif. Intell. Res.(JAIR), vol.13, pp.227-303, 2000.

P. Dorato, V. Cerone, and C. Abdallah, Linear Quadratic Control: An Introduction, 2000.

D. Ernst, P. Geurts, and L. Wehenkel, Tree-based batch mode reinforcement learning, Journal of Machine Learning Research, vol.6, pp.503-556, 2005.

C. Hwang, A. Syed, and M. Masud, Multiple objective decision making-methods and applications: a state-of-the-art survey, 2012.
DOI : 10.1007/978-3-642-45511-7

M. Jose, . Vidal, M. José, and . Vidal, Fundamentals of multiagent systems, 2006.

E. Mengi, M. Yildirim, and . Kilic, Numerical Optimization of Eigenvalues of Hermitian Matrix Functions, SIAM Journal on Matrix Analysis and Applications, vol.35, issue.2, pp.699-724, 2014.
DOI : 10.1137/130933472

P. Diederik, J. Kingma, and . Ba, Adam: A method for stochastic optimization. CoRR, abs/1412, 2014.