P. Dimitri, S. Bertsekas, and . Ioffe, Temporal differences-based policy iteration and applications in neuro-dynamic programming, 1997.

P. Dimitri, J. N. Bertsekas, and . Tsitsiklis, Neuro-dynamic programming, Athena Scientific, 1996.

V. F. Farias and B. Van-roy, Probabilistic and Randomized Methods for Design Under Uncertainty, chapter 6: Tetris: A Study of Randomized Constraint Sampling, 2006.

. Sham-machandranath-kakade, A Natural Policy Gradient, Advances in neural information processing systems, pp.1531-1538, 2001.

R. Parr, L. Li, G. Taylor, C. Painter-wakefield, and M. L. Littman, An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008.
DOI : 10.1145/1390156.1390251

W. B. Powell, Approximate dynamic programming for high-dimensional problems, Tutorial presented at the IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007.

L. Martin and . Puterman, Markov decision processes: Discrete stochastic dynamic programming, 2005.

S. Richard, A. Sutton, and . Barto, Reinforcement learning, 1998.