A. Antos, C. Szepesvári, M. , and R. , Learning Near-optimal Policies with Bellman-residual Minimization based Fitted Policy Iteration and a Single Sample Path, COLT, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00830201

T. Archibald, K. Mckinnon, T. , and L. , On the Generation of Markov Decision Processes, Journal of the Operational Research Society, vol.46, issue.3, pp.354-361, 1995.
DOI : 10.1057/jors.1995.50

L. C. Baird, Residual Algorithms: Reinforcement Learning with Function Approximation, ICML, 1995.
DOI : 10.1016/B978-1-55860-377-6.50013-X

D. Bertsekas and S. Ioffe, Temporal differences-based policy iteration and applications i n neuro-dynamic programming, 1996.

D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scien- tific, 1996.

D. P. Bertsekas and H. Yu, Projected equation methods for approximate solution of large linear systems, Journal of Computational and Applied Mathematics, vol.227, issue.1, pp.27-50, 2009.
DOI : 10.1016/j.cam.2008.07.037

L. Bottou and O. Bousquet, The tradeoffs of large scale learning, Optimization for Machine Learning, pp.351-368, 2011.

S. J. Bradtke and A. G. Barto, Linear Least-Squares algorithms for temporal difference learning, Machine Learning, pp.1-3, 1996.

D. Choi and B. Van-roy, A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning, Discrete Event Dynamic Systems, vol.22, issue.1,2,3, pp.207-239, 2006.
DOI : 10.1007/s10626-006-8134-8

Y. Engel, Algorithms and Representations for Reinforcement Learning, 2005.

M. Geist and O. Pietquin, Eligibility traces through colored noises, International Congress on Ultra Modern Telecommunications and Control Systems, 2010.
DOI : 10.1109/ICUMT.2010.5676597
URL : https://hal.archives-ouvertes.fr/hal-00553910

M. Geist and O. Pietquin, Kalman Temporal Differences, JAIR, vol.39, pp.483-532, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00351297

M. Geist and O. Pietquin, Algorithmic Survey of Parametric Value Function Approximation, IEEE Transactions on Neural Networks and Learning Systems, 2013.
DOI : 10.1109/TNNLS.2013.2247418
URL : https://hal.archives-ouvertes.fr/hal-00869725

M. Kearns and S. Singh, Bias-Variance Error Bounds for Temporal Difference Updates, COLT, 2000.

J. Z. Kolter, The Fixed Ponts of Off-Policy TD, Neural Information Processing Systems (NIPS), 2011.

H. R. Maei and R. S. Sutton, GQ(?): A general gradient algorithm for temporaldifference prediction learning with eligibility traces, Conference on Artificial General Intelligence, 2010.

R. Munos, Error Bounds for Approximate Policy Iteration, ICML, 2003.

A. Nedi? and D. P. Bertsekas, Least Squares Policy Evaluation Algorithms with Linear Function Approximation, DEDS, vol.13, pp.79-110, 2003.

D. Precup, R. S. Sutton, and S. P. Singh, Eligibility Traces for Off-Policy Policy Evaluation, ICML, 2000.

D. Precup, R. S. Sutton, and S. Dasgupta, Off-Policy Temporal-Difference Learning with Function Approximation, Proceedings of the 18th International Conference on Machine Learning, 2001.

R. S. Randhawa and S. Juneja, Combining importance sampling and temporal difference control variates to simulate Markov Chains, ACM Transactions on Modeling and Computer Simulation, vol.14, issue.1, pp.1-30, 2004.
DOI : 10.1145/974734.974735

B. D. Ripley, Stochastic Simulation, 1987.
DOI : 10.1002/9780470316726

B. Scherrer, Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view, ICML, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00537403

B. Scherrer and M. Geist, Recursive Least-Squares Learning with Eligibility Traces, European Wrokshop on Reinforcement Learning (EWRL 11), 2011.
DOI : 10.1007/978-3-642-29946-9_14
URL : https://hal.archives-ouvertes.fr/hal-00644511

R. Schoknecht, Optimality of Reinforcement Learning Algorithms with Linear Function Approximation, NIPS, 2002.

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning), 1998.
DOI : 10.1007/978-1-4615-3618-5

R. S. Sutton, H. R. Maei, D. Precup, S. Bhatnagar, D. Silver et al., Fast gradient-descent methods for temporal-difference learning with linear function approximation, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553501

J. Tsitsiklis and B. Van-roy, An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control, vol.42, issue.5, pp.674-690, 1997.
DOI : 10.1109/9.580874