On the generation of Markov decision processes, Journal of the Operational Research Society, vol.46, pp.354-361, 1995. ,
Neuro-Dynamic Programming, 1996. ,
Finite-sample analysis of least-squares policy iteration, Journal of Machine Learning Research, vol.13, pp.3041-3074, 2012. ,
Least squares policy evaluation algorithms with linear function approximation, Theory and Applications, vol.13, pp.79-110, 2002. ,
On the use of non-stationary policies for stationary infinite-horizon Markov decision processes, NIPS 2012 Adv.in Neural Information Processing Systems, 2012. ,
An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control, 1997. ,