A. Antos, C. Szepesvári, and R. Munos, Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, vol.22, issue.1, pp.89-129, 2008.
DOI : 10.1007/978-1-4612-5254-2
URL : https://hal.archives-ouvertes.fr/hal-00830201

C. M. Bishop, Pattern Recognition and Machine Learning, 2006.

C. M. Bishop and M. E. Tipping, Variational relevance vector machines, In: Uncertainty in Artificial Intelligence, pp.46-53, 2000.

D. M. Blei, A. Kucukelbir, and J. D. Mcauliffe, Variational Inference: A Review for Statisticians, Journal of the American Statistical Association, vol.2, issue.518, 2016.
DOI : 10.1016/j.neuroimage.2007.04.054
URL : http://arxiv.org/abs/1601.00670

J. Boyan, Technical update: Least-squares temporal difference learning, Machine Learning, vol.49, issue.2/3, pp.233-246, 2002.
DOI : 10.1023/A:1017936530646

S. Bradtke and A. Barto, Linear least-squares algorithms for temporal difference learning, Machine Learning, vol.22, issue.1, pp.33-57, 1996.
DOI : 10.1007/978-0-585-33656-5_4
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.143.857

C. Dann, G. Neumann, and J. Peters, Policy evaluation with temporal differences: A survey and comparison, Journal of Machine Learning Research, vol.15, pp.809-883, 2014.

B. Efron, T. Hastie, L. Johnstone, and R. Tibshirani, Least angle regression, Annals of Statistics, vol.32, pp.407-499, 2004.

Y. Engel, S. Mannor, and R. Meir, Gaussian Process Reinforcement Learning, International Conference on Machine Learning, pp.201-208, 2005.
DOI : 10.1007/978-1-4899-7502-7_109-1

A. M. Farahmand, M. Ghavamzadeh, C. Szepesvári, and S. Mannor, Regularized policy iteration, Advances in Neural Information Processing Systems 21, pp.441-448, 2008.

M. Geist and B. Scherrer, ???1-Penalized Projected Bellman Residual, Recent Advances in Reinforcement Learning -9th European Workshop, pp.89-101, 2011.
DOI : 10.1007/978-3-642-29946-9_12
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.220.6770

M. Geist, B. Scherrer, A. Lazaric, and M. Ghavamzadeh, A dantzig selector approach to temporal difference learning, International Conference on Machine Learning, pp.1399-1406, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00749480

A. Geramifard, M. Bowling, and R. S. Sutton, Incremental least-square temporal difference learning, The Twenty-first National Conference on Artificial Intelligence (AAAI), pp.356-361, 2006.

M. Ghavamzadeh, A. Lazaric, R. Munos, and M. W. Hoffman, Finite-sample analysis of lasso-td, International Conference on Machine Learning, pp.1177-1184, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00830149

T. Hastie, R. Tibshirani, and J. Friedman, The elements of statistical learning: data mining, inference and prediction, 2009.

M. W. Hoffman, A. Lazaric, M. Ghavamzadeh, and R. Munos, Regularized Least Squares Temporal Difference Learning with Nested ???2 and ???1 Penalization, Recent Advances in Reinforcement Learning -9th European Workshop, pp.102-114, 2011.
DOI : 10.1007/978-3-642-29946-9_13

J. Johns, C. Painter-wakefield, and R. Parr, Linear complementarity for regularized policy evaluation and improvement, Advances in Neural Information Processing Systems 23, pp.1009-1017, 2010.

M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, An Introduction to Variational Methods for Graphical Models, Machine Learning, vol.37, issue.2, pp.183-233, 1999.
DOI : 10.1007/978-94-011-5014-9_5
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.106.3844

J. Z. Kolter and A. Y. Ng, Regularization and feature selection in least-squares temporal difference learning, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.521-528, 2009.
DOI : 10.1145/1553374.1553442
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.149.5506

M. Lagoudakis and R. Parr, Least-squares policy iteration, The Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003.

A. Lazaric, M. Ghavamzadeh, and R. Munos, Finite-sample analysis of LSTD, International Conference on Machine Learning, pp.615-622, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00482189

B. Liu, L. Zhang, and J. Liu, Dantzig selector with an approximately optimal denoising matrix and its application in sparse reinforcement learning, Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, UAI, 2016.

S. Mannor, D. Simester, P. Sun, and J. N. Tsitsiklis, Bias and variance in value function estimation, Twenty-first international conference on Machine learning , ICML '04, 2004.
DOI : 10.1145/1015330.1015402
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.946

A. Nedi´cnedi´c and D. P. Bertsekas, Least squares policy evaluation algorithms with linear function approximation, Discrete Event Dynamic Systems, vol.13, issue.1/2, pp.79-110, 2003.
DOI : 10.1023/A:1022192903948

C. Painter-wakefield and R. Parr, Greedy algorithms for sparse reinforcement learning, International Conference on Machine Learning, 2012.

G. Parisi, Statistical field theory, Frontiers in Physics, 1988.

B. A. Pires, Statistical analysis of l1-penalized linear estimation with applications, 2011.

M. L. Puterman, Markov Decision Processes : Discrete Stochastic Dynamic Programming, 2005.
DOI : 10.1002/9780470316887

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

R. Sutton, H. Maei, D. Precup, S. Bhatnagar, D. Silver et al., Fast gradient-descent methods for temporal-difference learning with linear function approximation, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.993-1000, 2009.
DOI : 10.1145/1553374.1553501
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.149.5674

M. E. Tipping, Sparse bayesian learning and the relevance vector machine, Journal of Machine Learning Research, vol.1, pp.211-244, 2001.

N. Tziortziotis, Machine Learning for Intelligent Agents, Greece, 2015.

N. Tziortziotis and K. Blekas, Value Function Approximation through Sparse Bayesian Modeling, Recent Advances in Reinforcement Learning -9th European Workshop, pp.128-139, 2011.
DOI : 10.1007/978-3-642-29946-9_15
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.231.3634