A. Antos, R. Munos, and C. Szepesvari, Fitted Q-iteration in continuous action-space MDPs, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00185311

A. Antos, C. Szepesvarf, M. , and R. , Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp.330-337, 2007.
DOI : 10.1109/ADPRL.2007.368207

URL : https://hal.archives-ouvertes.fr/inria-00124833

D. Bertsekas and J. Tsitsiklis, Neuro-dynamic programming, 1996.
DOI : 10.1007/0-306-48332-7_333

D. Bertsekas and H. Yu, Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming, Mathematics of Operations Research, vol.37, issue.1, pp.66-94, 2012.
DOI : 10.1287/moor.1110.0532

P. Canbolat and U. Rothblum, (Approximate) iterated successive approximations algorithm for sequential decision processes, Annals of Operations Research, vol.3, issue.3, pp.1-12, 2012.
DOI : 10.1007/s10479-012-1073-x

A. Farahmand, R. Munos, and C. Szepesvári, Error propagation for approximate policy and value iteration (extended version), NIPS, 2010.

S. Kakade, On the Sample Complexity of Reinforcement Learning, 2003.

R. Munos, Error bounds for approximate policy iteration, International Conference on Machine Learning (ICML), pp.560-567, 2003.

R. Munos, Performance Bounds in $L_p$???norm for Approximate Value Iteration, SIAM Journal on Control and Optimization, vol.46, issue.2, pp.541-561, 2007.
DOI : 10.1137/040614384

M. Puterman, Markov decision processes: Discrete stochastic dynamic programming, 1994.
DOI : 10.1002/9780470316887

M. Puterman and M. Shin, Modified Policy Iteration Algorithms for Discounted Markov Decision Problems, Management Science, vol.24, issue.11, pp.1127-1137, 1978.
DOI : 10.1287/mnsc.24.11.1127

B. Scherrer and B. Lesner, On the use of non-stationary policies for stationary infinite-horizon markov decision processes, Advances in Neural Information Processing Systems 25, pp.1835-1843, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00758809

B. Scherrer and C. Thiery, Performance bound for approximate optimistic policy iteration, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00480952

B. Scherrer, M. Ghavamzadeh, V. Gabillon, and M. Geist, Approximate modified policy iteration, Proceedings of the 29th International Conference on Machine Learning (ICML-12), pp.1207-1214, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00758882

B. Scherrer, V. Gabillon, M. Ghavamzadeh, and M. Geist, Approximate modified policy iteration, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00758882

S. Singh and R. Yee, An upper bound on the loss from approximate optimal-value functions, Machine Learning, pp.16-19, 1994.
DOI : 10.1007/BF00993308