Temporal differences-based policy iteration and applications in neuro-dynamic programming, 1997. ,
Neuro-dynamic programming, Athena Scientific, 1996. ,
Probabilistic and Randomized Methods for Design Under Uncertainty, chapter 6: Tetris: A Study of Randomized Constraint Sampling, 2006. ,
A Natural Policy Gradient, Advances in neural information processing systems, pp.1531-1538, 2001. ,
An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008. ,
DOI : 10.1145/1390156.1390251
Approximate dynamic programming for high-dimensional problems, Tutorial presented at the IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007. ,
Markov decision processes: Discrete stochastic dynamic programming, 2005. ,
Reinforcement learning, 1998. ,