A. Antos, C. Szepesvári, and R. Munos, Learning near-optimal policies with Bellman-residual minimization based tted policy iteration and a single sample path, Machine Learning, p.89129, 2008.

A. M. Farahmand, M. Ghavamzadeh, C. Szepesvári, and S. Mannor, Regularized Policy Iteration, 2008.

R. Munos, Error Bounds for Approximate Policy Iteration, 2003.

Y. Saad, Iterative Methods for Sparse Linear Systems, 2003.
DOI : 10.1137/1.9780898718003

I. References and R. Schoknecht, Optimality of Reinforcement Learning Algorithms with Linear Function Approximation, 2002.

R. S. Sutton, H. R. Maei, D. Precup, S. Bhatnagar, D. Silver et al., Fast Gradient-Descent Methods for Temporal-Dierence Learning with Linear Function Approximation, 2009.

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

R. J. Williams and L. C. Baird, Tight performance bounds on greedy policies based on imperfect value functions, 1993.

H. Yu and D. P. Bertsekas, New Error Bounds for Approximations from Projected Linear Equations, 2008.