Temporal differences-based policy iteration and applications in neuro-dynamic programming, 1997. ,

Neuro-dynamic programming, Athena Scientific, 1996. ,

Probabilistic and Randomized Methods for Design Under Uncertainty, chapter 6: Tetris: A Study of Randomized Constraint Sampling, 2006. ,

A Natural Policy Gradient, Advances in neural information processing systems, pp.1531-1538, 2001. ,

An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008. ,

DOI : 10.1145/1390156.1390251

Approximate dynamic programming for high-dimensional problems, Tutorial presented at the IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007. ,

Markov decision processes: Discrete stochastic dynamic programming, 2005. ,

Reinforcement learning, 1998. ,