T. Archibald, K. Mckinnon, T. , and L. , On the Generation of Markov Decision Processes, Journal of the Operational Research Society, vol.46, issue.3, pp.354-361, 1995.
DOI : 10.1057/jors.1995.50

D. Bertsekas and J. Tsitsiklis, Neuro-Dynamic Programming, 1996.
DOI : 10.1007/0-306-48332-7_333

J. A. Boyan, Technical update: Least-squares temporal difference learning, Machine Learning, pp.2-3, 2002.

T. P. Hayes, A large-deviation inequality for vector-valued martingales, 2005.

A. Lazaric, M. Ghavamzadeh, M. , and R. , Finite-sample analysis of least-squares policy iteration, Journal of Machine Learning Research, vol.13, pp.3041-3074, 2012.
URL : https://hal.archives-ouvertes.fr/inria-00528596

A. Nedic and D. P. Bertsekas, Least squares policy evaluation algorithms with linear function approximation, Theory and Applications, vol.13, pp.79-110, 2002.

B. Scherrer, Should one compute the temporal difference fix point or minimize the bellman residual? the unified oblique projection view, ICML, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00537403

B. Scherrer and B. Lesner, On the use of non-stationary policies for stationary infinite-horizon Markov decision processes, NIPS 2012 Adv.in Neural Information Processing Systems, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00758809

C. Szepesvári, Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, vol.4, issue.1, 2010.
DOI : 10.2200/S00268ED1V01Y201005AIM009

J. N. Tsitsiklis and B. V. Roy, An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control, vol.42, issue.5, 1997.
DOI : 10.1109/9.580874

B. Yu, Rates of convergence for empirical processes stationnary mixing consequences. The Annals of Probability, pp.3041-3074, 1994.

H. Yu, Convergence of least-squares temporal difference methods under general conditions, ICML, 2010.