A. Antos, C. Szepesvári, and R. Munos, Learning Near-optimal Policies with Bellman-residual Minimization based Fitted Policy Iteration and a Single Sample Path, p.COLT, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00830201

L. C. Baird, Residual Algorithms: Reinforcement Learning with Function Approximation, p.ICML, 1995.
DOI : 10.1016/B978-1-55860-377-6.50013-X

D. P. Bertsekas and H. Yu, Projected equation methods for approximate solution of large linear systems, Journal of Computational and Applied Mathematics, vol.227, issue.1, pp.27-50, 2009.
DOI : 10.1016/j.cam.2008.07.037

J. A. Boyan, Technical Update: Least-Squares Temporal Difference Learning, Machine Learning, vol.49, issue.2/3, pp.233-246, 1999.
DOI : 10.1023/A:1017936530646

S. J. Bradtke and A. G. Barto, Linear Least-Squares algorithms for temporal difference learning, Machine Learning, vol.22, issue.1-3, pp.33-57, 1996.

D. Choi and B. Van-roy, A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning, Discrete Event Dynamic Systems, vol.22, issue.1,2,3, pp.207-239, 2006.
DOI : 10.1007/s10626-006-8134-8

Y. Engel, Algorithms and Representations for Reinforcement Learning, 2005.

M. Geist and O. Pietquin, Eligibility traces through colored noises, International Congress on Ultra Modern Telecommunications and Control Systems, p.ICUMT, 2010.
DOI : 10.1109/ICUMT.2010.5676597

URL : https://hal.archives-ouvertes.fr/hal-00553910

M. Geist and O. Pietquin, Parametric value function approximation: A unified view, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), p.ADPRL, 2011.
DOI : 10.1109/ADPRL.2011.5967355

URL : https://hal.archives-ouvertes.fr/hal-00618112

M. Kearns and S. Singh, Bias-Variance Error Bounds for Temporal Difference Updates, In: COLT, 2000.

H. R. Maei and R. S. Sutton, GQ(?): A general gradient algorithm for temporaldifference prediction learning with eligibility traces, In: Conference on Artificial General Intelligence, 2010.

R. Munos, Error Bounds for Approximate Policy Iteration, p.ICML, 2003.

A. Nedi´cnedi´c and D. P. Bertsekas, Least Squares Policy Evaluation Algorithms with Linear Function Approximation, DEDS, vol.13, pp.79-110, 2003.

D. Precup, R. S. Sutton, and S. P. Singh, Eligibility Traces for Off-Policy Policy Evaluation, p.ICML, 2000.

B. D. Ripley, Stochastic Simulation, 1987.
DOI : 10.1002/9780470316726

B. Scherrer, Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view, p.ICML, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00537403

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning), 1998.
DOI : 10.1007/978-1-4615-3618-5

J. Tsitsiklis and B. Van-roy, An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control, vol.42, issue.5, pp.674-690, 1997.
DOI : 10.1109/9.580874

H. Yu, Convergence of Least-Squares Temporal Difference Methods under General Conditions, p.ICML, 2010.