D. Bertsekas and S. Ioffe, Temporal differences-based policy iteration and applications in neuro-dynamic programming, 1996.

D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, 1996.

R. Coulom, Reinforcement Learning Using Neural Networks with Applications to Motor Control, 2002.
URL : https://hal.archives-ouvertes.fr/tel-00003985

S. E. Fahlman and C. Lebiere, The cascade-correlation learning architecture, Advances in Neural Information Processing Systems, pp.524-532, 1989.

J. Johns and S. Mahadevan, Constructing basis functions from directed graphs for value function approximation, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.385-392, 2007.
DOI : 10.1145/1273496.1273545

W. Philipp, S. Keller, D. Mannor, and . Precup, Automatic basis function construction for approximate dynamic programming and reinforcement learning, ICML '06: Proceedings of the 23rd international conference on Machine learning, pp.449-456, 2006.

G. Michail, R. Lagoudakis, and . Parr, Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003.

S. Mahadevan, Representation policy iteration, UAI, pp.372-379, 2005.

S. Mahadevan and M. Maggioni, Proto-value functions, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.2169-2231, 2007.
DOI : 10.1145/1102351.1102421

I. Menache, S. Mannor, and N. Shimkin, Basis Function Adaptation in Temporal Difference Reinforcement Learning, Annals of Operations Research, vol.34, issue.1/2/3, pp.215-238, 2005.
DOI : 10.1007/s10479-005-5732-z

R. Munos, Error bounds for approximate policy iteration, ICML '03: Proceedings of the 20th international conference on Machine learning, pp.560-567, 2003.

R. Parr, C. Painter-wakefield, L. Li, and M. Littman, Analyzing feature generation for value-function approximation, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.737-744, 2007.
DOI : 10.1145/1273496.1273589

M. Riedmiller and H. Braun, A direct adaptive method for faster backpropagation learning: the RPROP algorithm, IEEE International Conference on Neural Networks, pp.586-591, 1993.
DOI : 10.1109/ICNN.1993.298623

F. Rivest and D. Precup, Combining td-learning with cascadecorrelation networks, ICML '03: Proceedings of the 20th international conference on Machine learning, pp.632-639, 2003.

S. Richard, A. G. Sutton, and . Barto, Reinforcement Learning: An Introduction, 1998.