R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

R. Howard, Dynamic programming and Markov processes, 1960.

M. Puterman, Markov Decision Processes ? Discrete Stochastic Dynamic Programming . Probability and mathematical statistics, 1994.

R. S. Sutton, D. Mcallester, S. Singh, and Y. Mansour, Policy gradient methods for reinforcement learning with function approximation, Neural Information Processing Systems (NIPS), pp.1057-1063, 1999.

R. J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, vol.8, pp.229-256, 1992.

V. R. Konda and J. N. Tsitsiklis, OnActor-Critic Algorithms, SIAM Journal on Control and Optimization, vol.42, issue.4, pp.1143-1166, 2003.
DOI : 10.1137/S0363012901385691

J. Peters and S. Schaal, Policy gradient methods for robotics. Intelligent Robots and Systems, IEEE/RSJ International Conference on, pp.2219-2225, 2006.

S. I. Amari, Natural Gradient Works Efficiently in Learning, Neural Computation, vol.37, issue.2, pp.251-276, 1998.
DOI : 10.1103/PhysRevLett.76.2188

J. Peters and S. Schaal, Natural actor-critic, Neurocomput, vol.717, issue.9, pp.1180-1190, 2008.

S. Bhatnagar, R. Sutton, M. Ghavamzadeh, and M. Lee, Incremental natural actorcritic algorithms, Advances in Neural Information Processing Systems, pp.105-112, 2008.

M. Riedmiller, J. Peters, and S. Schaal, Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp.254-261, 2007.
DOI : 10.1109/ADPRL.2007.368196

S. E. Fahlman and C. Lebiere, The cascade-correlation learning architecture, Advances in Neural Information Processing Systems, pp.524-532, 1989.

M. Riedmiller and H. Braun, A direct adaptive method for faster backpropagation learning: the RPROP algorithm, IEEE International Conference on Neural Networks, pp.586-591, 1993.
DOI : 10.1109/ICNN.1993.298623

I. Guyon and A. Elisseff, An introduction to variable and feature selection, Journal of Machine Learning Research, vol.3, pp.1157-1182, 2003.

I. Menache, S. Mannor, and N. Shimkin, Basis Function Adaptation in Temporal Difference Reinforcement Learning, Annals of Operations Research, vol.34, issue.1/2/3, pp.215-238, 2005.
DOI : 10.1007/s10479-005-5732-z

P. W. Keller, S. Mannor, and D. Precup, Automatic basis function construction for approximate dynamic programming and reinforcement learning, Proceedings of the 23rd international conference on Machine learning , ICML '06, pp.449-456, 2006.
DOI : 10.1145/1143844.1143901

R. Parr, C. Painter-wakefield, L. Li, and M. Littman, Analyzing feature generation for value-function approximation, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.737-744, 2007.
DOI : 10.1145/1273496.1273589

S. Mahadevan, Representation policy iteration, In: UAI, pp.372-379, 2005.

J. Johns and S. Mahadevan, Constructing basis functions from directed graphs for value function approximation, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.385-392, 2007.
DOI : 10.1145/1273496.1273545

S. Mahadevan and M. Maggioni, Proto-value functions, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.2169-2231, 2007.
DOI : 10.1145/1102351.1102421

F. Rivest and D. Precup, Combining td-learning with cascade-correlation networks, pp.632-639, 2003.