Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, vol.22, issue.1, pp.89-129, 2008. ,
DOI : 10.1007/s10994-007-5038-2
URL : https://hal.archives-ouvertes.fr/hal-00830201
Residual Algorithms: Reinforcement Learning with Function Approximation, Proceedings of the Twelfth International Conference on Machine Learning, pp.30-37, 1995. ,
DOI : 10.1016/B978-1-55860-377-6.50013-X
Dynamic Programming and Optimal Control, volume II, Athena Scientific, 2007. ,
Neuro-Dynamic Programming, Athena Scientific, 1996. ,
DOI : 10.1007/0-306-48332-7_333
Least-squares temporal difference learning, Proceedings of the 16th International Conference on Machine Learning, pp.49-56, 1999. ,
Linear least-squares algorithms for temporal difference learning, Machine Learning, pp.33-57, 1996. ,
DOI : 10.1007/978-0-585-33656-5_4
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.143.857
Exponential inequalities for self-normlized processes with applications, Electronic Communications in Probability, vol.14, pp.372-381, 2009. ,
Pseudo-maximization and self-normalized processes, Propability Surveys, vol.4, pp.172-192, 2007. ,
A distribution-free theory of nonparametric regression, 2002. ,
DOI : 10.1007/b97848
Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003. ,
Finite-sample analysis of lstd, Proceedings of the 27th International Conference on Machine Learning, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00482189
Nonparametric time series prediction through adaptive model selection, Machine Learning, pp.5-34, 2000. ,
Should one compute the temporal difference fix point or minimize the bellman residual? the unified oblique projection view, Proceedings of the 27th International Conference on Machine Learning, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00537403
Generalized polynomial approximations in Markovian decision processes, Journal of Mathematical Analysis and Applications, vol.110, issue.2, pp.568-582, 1985. ,
DOI : 10.1016/0022-247X(85)90317-8
Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998. ,
DOI : 10.1109/TNN.1998.712192
The Generic Chaining: Upper and Lower Bounds of Stochastic Processes, 2005. ,
An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control, vol.42, issue.5, pp.674-690, 1997. ,
DOI : 10.1109/9.580874
Average cost temporal-difference learning, Automatica, vol.35, issue.11, pp.1799-1808, 1999. ,
DOI : 10.1016/S0005-1098(99)00099-0
Rates of convergence for empirical processes of stationary mixing sequences. The Annals of Probability, pp.94-116, 1994. ,
Convergence of least squares temporal difference methods under general conditions, Proceedings of the 27th International Conference on Machine Learning, 2010. ,
Error Bounds for Approximations from Projected Linear Equations, Mathematics of Operations Research, vol.35, issue.2, pp.306-329, 2010. ,
DOI : 10.1287/moor.1100.0441