A. Antos, C. Szepesvari, and R. Munos, Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, vol.22, issue.1, pp.89-129, 2008.
DOI : 10.1007/s10994-007-5038-2
URL : https://hal.archives-ouvertes.fr/hal-00830201

T. Archibald, K. Mckinnon, and L. Thomas, On the Generation of Markov Decision Processes, Journal of the Operational Research Society, vol.46, issue.3, pp.354-361, 1995.
DOI : 10.1057/jors.1995.50

J. Baxter and P. L. Bartlett, Infinite-horizon gradient-based policy search, Journal of Artificial Intelligence Research (JAIR), vol.15, pp.319-350, 2001.

D. Bertsekas and J. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, 1996.

D. P. Bertsekas, Dynamic Programming and Optimal Control, Athena Scientific, 1995.

S. Bhatnagar, R. S. Sutton, M. Ghavamzadeh, and M. Lee, Incremental natural actor-critic algorithms, Advances in Neural Information Processing Systems (NIPS), 2007.

A. Fern, S. Yoon, and R. Givan, Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes, Journal of Artificial Intelligence Research (JAIR), vol.25, pp.75-118, 2006.

M. Ghavamzadeh and A. Lazaric, Conservative and Greedy Approaches to Classification-based Policy Iteration, Conference on Artificial Intelligence (AAAI), 2012.
URL : https://hal.archives-ouvertes.fr/hal-00772610

V. Heidrich-meisner and C. Igel, Evolution Strategies for Direct Policy Search, International Conference on Parallel Problem Solving from Nature (PPSN X), pp.428-437, 2008.
DOI : 10.1007/978-3-540-87700-4_43

S. Kakade, A Natural Policy Gradient, Advances in Neural Information Processing Systems (NIPS), 2001.

S. Kakade and J. Langford, Approximately optimal approximate reinforcement learning, International Conference on Machine Learning (ICML), 2002.

J. Kober and J. Peters, Policy Search for Motor Primitives in Robotics, Machine Learning, pp.171-203, 2011.

M. Lagoudakis and R. Parr, Least-squares policy iteration, Journal of Machine Learning Research (JMLR), vol.4, pp.1107-1149, 2003.

M. Lagoudakis and R. Parr, Reinforcement learning as classification: Leveraging modern classifiers, International Conference on Machine Learning (ICML), 2003.

A. Lazaric, M. Ghavamzadeh, and R. Munos, Finite-sample analysis of least-squares policy iteration, Journal of Machine learning Research, vol.13, pp.3041-3074, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00528596

A. Lazaric, M. Ghavamzadeh, and R. Munos, Analysis of a classification-based policy iteration algorithm, International Conference on Machine Learning (ICML), 2010.
URL : https://hal.archives-ouvertes.fr/inria-00482065

L. Mason, J. Baxter, P. Bartlett, and M. Frean, Boosting algorithms as gradient descent in function space, Tech. rep., Australian National University, 1999.

R. Munos, Error bounds for approximate policy iteration, International Conference on Machine Learning (ICML), 2003.

R. Munos, Performance bounds in Lp norm for approximate value iteration, SIAM Journal on Control and Optimization, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00124685

J. Peters and S. Schaal, Natural Actor-Critic, Neurocomputing, vol.71, issue.7-9, pp.1180-1190, 2008.
DOI : 10.1016/j.neucom.2007.11.026

M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1994.
DOI : 10.1002/9780470316887

B. Scherrer, V. Gabillon, M. Ghavamzadeh, and M. Geist, Approximate Modified Policy Iteration, International Conference on Machine Learning (ICML), 2012.
URL : https://hal.archives-ouvertes.fr/hal-00758882

B. Scherrer and B. Lesner, On the Use of Non-Stationary Policies for Stationary Infinite- Horizon Markov Decision Processes, Advances in Neural Information Processing Systems (NIPS), 2012.
URL : https://hal.archives-ouvertes.fr/hal-00758809

R. Sutton and A. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

R. S. Sutton, D. A. Mcallester, S. P. Singh, and Y. Mansour, Policy Gradient Methods for Reinforcement Learning with Function Approximation, Advances in Neural Information Processing Systems (NIPS), 1999.