Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, vol.22, issue.1, pp.89-129, 2008. ,
DOI : 10.1007/s10994-007-5038-2
URL : https://hal.archives-ouvertes.fr/hal-00830201
Least-squares temporal difference learning, Proceedings of the 16th International Conference on Machine Learning, pp.49-56, 1999. ,
Linear least-squares algorithms for temporal difference learning, Machine Learning, pp.33-57, 1996. ,
Regularized policy iteration, Proceedings of Advances in Neural Information Processing Systems 21, pp.441-448, 2008. ,
Regularized fitted Qiteration for planning in continuous-space Markovian decision problems, Proceedings of the American Control Conference, pp.725-730, 2009. ,
Automatic basis function construction for approximate dynamic programming and reinforcement learning, Proceedings of the 23rd international conference on Machine learning , ICML '06, pp.449-456, 2006. ,
DOI : 10.1145/1143844.1143901
Regularization and feature selection in least-squares temporal difference learning, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.521-528, 2009. ,
DOI : 10.1145/1553374.1553442
Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003. ,
Finite-sample analysis of least-squares policy iteration, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00528596
Finite-sample analysis of LSTD, Proceedings of the Twenty-Seventh International Conference on Machine Learning, pp.615-622, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00482189
Sparse Temporal Difference Learning Using LASSO, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp.352-359, 2007. ,
DOI : 10.1109/ADPRL.2007.368210
URL : https://hal.archives-ouvertes.fr/inria-00117075
Representation policy iteration, Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence, pp.372-379, 2005. ,
Compressed least-squares regression, Proceedings of Advances in Neural Information Processing Systems 22, pp.1213-1221, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00419210
Brownian motions and scrambled wavelets for least-squares regression, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00483017
Basis Function Adaptation in Temporal Difference Reinforcement Learning, Annals of Operations Research, vol.34, issue.1/2/3, pp.215-238, 2005. ,
DOI : 10.1007/s10479-005-5732-z
Finite time bounds for fitted value iteration, Journal of Machine Learning Research, vol.9, pp.815-857, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00120882
Analyzing feature generation for valuefunction approximation, Proceedings of the Twenty-Fourth International Conference on Machine Learning, pp.737-744, 2007. ,
Feature selection using regularization in approximate linear programs for Markov decision processes, Proceedings of the Twenty- Seventh International Conference on Machine Learning, pp.871-878, 2010. ,
Non-asymptotic Theory of Random Matrices: Extreme Singular Values, Proceedings of the International Congress of Mathematicians 2010 (ICM 2010), 2010. ,
DOI : 10.1142/9789814324359_0111
Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998. ,
DOI : 10.1109/TNN.1998.712192
The Random Projection Method, 2004. ,
DOI : 10.1007/978-1-4615-0013-1_16