D. Bertsekas and S. Ioffe, Temporal differences-based policy iteration and applications in neuro-dynamic programming, 1996.

J. A. Boyan, Technical update: Least-squares temporal difference learning, Machine Learning, pp.233-246, 2002.

S. J. Bradtke and A. G. Barto, Linear least-squares algorithms for temporal difference learning, Machine Learning, pp.33-57, 1996.

M. Kearns and S. Singh, Bias-variance error bounds for temporal difference updates, Proceedings of the 13th Annual Conference on Computational Learning Theory, pp.142-147, 2000.

M. G. Lagoudakis and R. Parr, Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003.

M. G. Lagoudakis, R. Parr, . Littman, and L. Michael, Least-Squares Methods in Reinforcement Learning for Control, SETN'02: Proceedings of the Second Hellenic Conference on AI, pp.249-260, 2002.
DOI : 10.1007/3-540-46014-4_23

A. Nedi´cnedi´c and D. P. Bertsekas, Least squares policy evaluation algorithms with linear function approximation . Discrete Event Dynamic Systems, pp.79-110, 2003.

R. Schoknecht, Optimality of reinforcement learning algorithms with linear function approximation, NIPS, pp.1555-1562, 2002.

R. S. Sutton, H. R. Maei, D. Precup, S. Bhatnagar, D. Silver et al., Fast gradient-descent methods for temporaldifference learning with linear function approximation, ICML'09: Proceedings of the 26th Annual International Conference on Machine Learning, pp.993-1000, 2009.

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

C. Szepesvári and R. Munos, Finite time bounds for sampling based fitted value iteration, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.880-887, 2005.
DOI : 10.1145/1102351.1102462

C. Thiery and B. Scherrer, Building Controllers for Tetris, ICGA Journal, vol.32, issue.1, pp.3-11, 2009.
DOI : 10.3233/ICG-2009-32102

URL : https://hal.archives-ouvertes.fr/inria-00418954

C. Thiery and B. Scherrer, Performance bound for Approximate Optimistic Policy Iteration
URL : https://hal.archives-ouvertes.fr/inria-00480952

H. Yu and D. P. Bertsekas, Convergence Results for Some Temporal Difference Methods Based on Least Squares, IEEE Trans. Automatic Control, vol.54, pp.1515-1531, 2009.