R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

C. Leemon and I. Baird, Residual algorithms: Reinforcement learning with function approximation, International Conference on Machine Learning, pp.30-37, 1995.

J. Boyan, Least-squares temporal difference learning, Proc. 16th International Conference on Machine Learning, pp.49-56, 1999.

A. Nediç and D. P. Bertsekas, Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems, pp.79-110, 2003.

A. Geramifard, M. Bowling, and R. Sutton, Incremental least-squares temporal difference learning, Proceeding of American Association for Artificial Intelligence (AAAI), pp.356-361, 2006.

A. Geramifard, M. Bowling, M. Zinkevich, and R. Sutton, iLSTD: Eligibility traces & convergence analysis, Proceeding of Neural Information Processing Systems Conference, 2006.

M. Loth, A unified view of td algorithms ? introducing full-gradient td and equi-gradient descent td, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00116936

M. Loth, Equi-gradient descent, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00116936

N. John, B. Tsitsiklis, and R. Van, An analysis of temporal-difference learning with function approximation, 1996.