D. Abel, . Agarwal, . Alekh, . Diaz, . Fernando et al., Exploratory gradient boosting for reinforcement learning in complex domains, ICML Workshop on Reinforcement Learning and Abstraction, 2016.

A. Antos, . Szepesvári, . Csaba, and R. Munos, Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, pp.89-129, 2008.
DOI : 10.1007/11776420_42
URL : https://hal.archives-ouvertes.fr/inria-00117130

A. G. Barto, . Sutton, S. Richard, A. , and C. W. , Neuronlike adaptive elements that can solve difficult learning control problems, IEEE Transactions on Systems, Man, and Cybernetics, vol.13, issue.5, pp.834-846, 1983.
DOI : 10.1109/TSMC.1983.6313077

P. Bühlmann and T. Hothorn, Boosting Algorithms: Regularization, Prediction and Model Fitting, Statistical Science, vol.22, issue.4, pp.477-505, 2008.
DOI : 10.1214/07-STS242

K. Doya, Reinforcement Learning in Continuous Time and Space, Neural Computation, vol.3, issue.1, pp.219-245, 2000.
DOI : 10.1109/9.580874
URL : http://meta.rad.atr.co.jp/doya/papers/nc99.ps.gz

D. Ernst, . Geurts, . Pierre, and L. Wehenkel, Treebased batch mode reinforcement learning, Journal of Machine Learning Research, vol.6, pp.503-556, 2005.

A. Farahmand, Regularization in Reinforcement Learning, p.89437, 2011.

A. Farahmand, . Massoud, and D. Precup, Value pursuit iteration, NIPS, pp.1349-1357, 2012.

A. Farahmand, . Massoud, . Ghavamzadeh, . Mohammad, . Szepesvári et al., Regularized Fitted Q-Iteration for planning in continuous-space Markovian decision problems, 2009 American Control Conference, pp.725-730, 2009.
DOI : 10.1109/ACC.2009.5160611
URL : http://webdocs.cs.ualberta.ca/~amir/papers/RFQI(ACC2009).pdf

A. Farahmand, . Massoud, . Munos, . Rémi, and C. Szepesvári, Error propagation for approximate policy and value iteration, NIPS, pp.568-576, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00830154

M. Fard, . Milani, . Grinberg, . Yuri, . Farahmand et al., Bellman error based feature generation using random projections on sparse spaces, NIPS, pp.3030-3038, 2013.

J. H. Friedman, Greedy function approximation: a gradient boosting machine, Annals of statistics, pp.1189-1232, 2001.

P. Geurts, D. Ernst, and L. Wehenkel, Extremely randomized trees, Machine Learning, vol.63, issue.1, pp.3-42, 2006.
DOI : 10.1007/s10994-006-6226-1
URL : https://hal.archives-ouvertes.fr/hal-00341932

I. Goodfellow, Y. Bengio, A. Courville, and . Learning, Adaptive Computation and Machine Learning Series, 2016.

G. J. Gordon, Stable Function Approximation in Dynamic Programming, Machine Learning, Proceedings of the Twelfth International Conference on Machine Learning, pp.261-268, 1995.
DOI : 10.1016/B978-1-55860-377-6.50040-2

. Györfi, . László, . Kohler, . Michael, . Krzyzak et al., A Distribution-Free Theory of Nonparametric Regression. Springer series in statistics, 2002.

T. J. Hastie, R. Tibshirani, and . John, Generalized additive models. Monographs on statistics and applied probability, 1990.

J. Kober, J. Bagnell, and P. Andrew, Reinforcement learning in robotics: A survey, The International Journal of Robotics Research, vol.8, issue.2, pp.1238-1274, 2013.
DOI : 10.1007/s10514-009-9132-0
URL : http://www.ri.cmu.edu/pub_files/2013/7/Kober_IJRR_2013.pdf

S. Mahadevan and M. Maggioni, Proto-value functions, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.2169-2231, 2007.
DOI : 10.1145/1102351.1102421

. Maillard, . Odalric-ambrym, . Munos, . Rémi, A. Lazaric et al., Finitesample analysis of bellman residual minimization, ACML, volume 13 of JMLR Proceedings, pp.299-314, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00830212

. Mnih, . Volodymyr, . Kavukcuoglu, . Koray, . Silver et al., Human-level control through deep reinforcement learning, Nature, vol.101, issue.7540, pp.518529-533
DOI : 10.1016/S0004-3702(98)00023-X

B. Fitted, Q. Moody, J. E. Saffell, and M. , Reinforcement learning for trading, NIPS, pp.917-923, 1998.

R. Munos and C. Szepesvári, Finite-time bounds for fitted value iteration, Journal of Machine Learning Research, vol.9, pp.815-857, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00120882

R. Munos, . Stepleton, . Tom, A. Harutyunyan, . Bellemare et al., Safe and efficient off-policy reinforcement learning, NIPS, pp.1046-1054, 2016.

R. Parr, . Painter-wakefield, . Christopher, . Li, . Lihong et al., Analyzing feature generation for value-function approximation, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.737-744, 2007.
DOI : 10.1145/1273496.1273589
URL : http://www.cs.duke.edu/~parr/icml07.pdf

. Piot, . Bilal, . Geist, . Matthieu, and O. Pietquin, Boosted Bellman Residual Minimization Handling Expert Demonstrations, ECML/PKDD, pp.549-564, 2014.
DOI : 10.1007/978-3-662-44851-9_35
URL : https://hal.archives-ouvertes.fr/hal-01060953

M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1994.
DOI : 10.1002/9780470316887

J. Randløv and P. Alstrøm, Learning to drive a bicycle using reinforcement learning and shaping, ICML, pp.463-471, 1998.

M. Riedmiller, Neural Fitted Q Iteration ??? First Experiences with a Data Efficient Neural Reinforcement Learning Method, pp.317-328, 2005.
DOI : 10.1007/11564096_32
URL : http://www.ni.uos.de/fileadmin/user_upload/publications/riedmiller.ecml2005.official.pdf

D. Silver, . Huang, . Aja, C. J. Maddison, . Guez et al., Mastering the game of Go with deep neural networks and tree search, Nature, vol.34, issue.7587, pp.529484-489, 1038.
DOI : 10.3233/ICG-2011-34302

R. S. Sutton, . Barto, and G. Andrew, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

. Van-der-vaart, . Aad, and J. A. Wellner, Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes, Birkhäuser Boston, pp.115-133, 2000.
DOI : 10.1007/978-1-4612-1358-1_9