Improved algorithms for linear stochastic bandits, Advances in Neural Information Processing Systems, 2011. ,
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, vol.22, issue.1, pp.89-129, 2008. ,
DOI : 10.1007/s10994-007-5038-2
URL : https://hal.archives-ouvertes.fr/hal-00830201
Residual Algorithms: Reinforcement Learning with Function Approximation, Proceedings of the Twelfth International Conference on Machine Learning, pp.30-37, 1995. ,
DOI : 10.1016/B978-1-55860-377-6.50013-X
Dynamic Programming and Optimal Control, volume II, Athena Scientific, 2007. ,
Neuro-Dynamic Programming, Athena Scientific, 1996. ,
DOI : 10.1007/0-306-48332-7_333
Least-squares temporal difference learning, Proceedings of the 16th International Conference on Machine Learning, pp.49-56, 1999. ,
Linear least-squares algorithms for temporal difference learning, Machine Learning, pp.33-57, 1996. ,
DOI : 10.1007/978-0-585-33656-5_4
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.143.857
Exponential inequalities for self-normalized processes with applications, Electronic Communications in Probability, vol.14, issue.0, pp.372-381, 2009. ,
DOI : 10.1214/ECP.v14-1490
Pseudo-maximization and self-normalized processes, Probability Surveys, vol.4, issue.0, pp.172-192, 2007. ,
DOI : 10.1214/07-PS119
Nonparametric regression with martingale increment errors, Stochastic Processes and their Applications, pp.2899-2924, 2011. ,
DOI : 10.1016/j.spa.2011.08.002
URL : https://hal.archives-ouvertes.fr/hal-00530581
Regularized policy iteration, Advances in Neural Information Processing Systems 21, pp.441-448, 2008. ,
Error propagation for approximate policy and value iteration, Advances in Neural Information Processing Systems, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00830154
Classification-based policy iteration with a critic, Proceedings of the Twenty-Eighth International Conference on Machine Learning, pp.1049-1056, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00590972
Lstd with random projections, Advances in Neural Information Processing Systems, pp.721-729, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00943120
Finite-sample analysis of lasso-td, Proceedings of the 28th International Conference on Machine Learning, pp.1177-1184, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00830149
A Distribution-free Theory of Nonparametric Regression, 2002. ,
DOI : 10.1007/b97848
Random Design Analysis of Ridge Regression, Proceedings of the 25th Conference on Learning Theory, 2012. ,
DOI : 10.1007/s10208-014-9192-1
Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003. ,
Finite-sample analysis of lstd, Proceedings of the 27th International Conference on Machine Learning, pp.615-622, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00482189
Nonparametric time series prediction through adaptive model selection, Machine Learning, pp.5-34, 2000. ,
Markov Chains and Stochastic Stability, 1993. ,
Statistical linear estimation with penalized estimators: an application to reinforcement learning, Proceedings of the 29th International Conference on Machine Learning, 2012. ,
Should one compute the temporal difference fix point or minimize the bellman residual? the unified oblique projection view, Proceedings of the 27th International Conference on Machine Learning, pp.959-966, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00537403
Generalized polynomial approximations in Markovian decision processes, Journal of Mathematical Analysis and Applications, vol.110, issue.2, pp.568-582, 1985. ,
DOI : 10.1016/0022-247X(85)90317-8
Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998. ,
DOI : 10.1109/TNN.1998.712192
The Generic Chaining: Upper and Lower Bounds of Stochastic Processes, 2005. ,
An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control, vol.42, issue.5, pp.674-690, 1997. ,
DOI : 10.1109/9.580874
Rates of convergence for empirical processes of stationary mixing sequences. The Annals of Probability, pp.94-116, 1994. ,
Convergence of least squares temporal difference methods under general conditions, Proceedings of the 27th International Conference on Machine Learning, pp.1207-1214, 2010. ,
Error Bounds for Approximations from Projected Linear Equations, Mathematics of Operations Research, vol.35, issue.2, pp.306-329, 2010. ,
DOI : 10.1287/moor.1100.0441