Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, vol.22, issue.1, pp.89-129, 2008. ,
DOI : 10.1007/s10994-007-5038-2
URL : https://hal.archives-ouvertes.fr/hal-00830201
Dynamic Programming and Optimal Control, Athena Scientific, 2001. ,
Least-squares temporal difference learning, Proceedings of the 16th International Conference on Machine Learning, pp.49-56, 1999. ,
Linear least-squares algorithms for temporal difference learning, Machine Learning, pp.33-57, 1996. ,
A distribution-free theory of nonparametric regression, 2002. ,
DOI : 10.1007/b97848
Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003. ,
Nonparametric time series prediction through adaptive model selection, Machine Learning, pp.5-34, 2000. ,
Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998. ,
DOI : 10.1109/TNN.1998.712192
An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control, vol.42, issue.5, pp.674-690, 1997. ,
DOI : 10.1109/9.580874
Rates of convergence for empirical processes of stationary mixing sequences. The Annals of Probability, pp.94-116, 1994. ,