W. Banzhaf, F. D. Francone, R. E. Keller, and P. Nordin, Genetic programming: an introduction: on the automatic evo- Girgin & Preux lution of computer programs and its applications, 1998.

D. Bertsekas and S. Ioffe, Temporal differences-based policy iteration and applications in neuro-dynamic programming, 1996.

D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, 1996.

R. Coulom, Reinforcement Learning Using Neural Networks, with Applications to Motor Control, 2002.
URL : https://hal.archives-ouvertes.fr/tel-00003985

A. Fukunaga, A. Stechert, D. M. , J. R. Koza, W. Banzhaf et al., A genome compiler for high performance genetic programming, Genetic Programming 1998: Proceedings of the Third Annual Conference, pp.86-94, 1998.

]. Gnu and . Lightning, Available from http, 2007.

R. John and . Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection, 1992.

R. John and . Koza, Genetic programming II: automatic discovery of reusable programs, 1994.

J. R. Koza, M. A. Keane, M. J. Streeter, W. Mydlowec, J. Yu et al., Genetic Programming IV: Routine Human- Competitive Machine Intelligence, 2003.

K. Krawiec, Genetic programming-based construction of features for machine learning and knowledge discovery tasks, Genetic Programming and Evolvable Machines, vol.3, issue.4, pp.329-343, 2002.
DOI : 10.1023/A:1020984725014

P. Nordin, A compiling genetic programming system that directly manipulates the machine code, Advances in Genetic Programming, pp.311-331, 1994.

M. Riedmiller, J. Peters, and S. Schaal, Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp.254-261, 2007.
DOI : 10.1109/ADPRL.2007.368196

S. Sanner, Online feature discovery in relational reinforcement learning, Open Problems in Statistical Relational Learning Workshop (SRL-06), 2006.

B. Scherrer, Performance bounds for lambda policy iteration, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00185271

G. Matthew, L. Smith, and . Bull, Genetic programming with a genetic algorithm for feature construction and selection, Genetic Programming and Evolvable Machines, vol.6, issue.3, pp.265-281, 2005.

W. Mark and . Spong, Swing up control of the acrobot, ICRA, pp.2356-2361, 1994.

S. Richard, A. G. Sutton, and . Barto, Reinforcement Learning: An Introduction, 1998.

R. S. Sutton, D. A. Mcallester, S. P. Singh, and Y. Mansour, Policy gradient methods for reinforcement learning with function approximation, NIPS, pp.1057-1063, 1999.

I. Unité-de-recherche, . Lorraine, . Loria, and . Technopôle-de-nancy, Brabois -Campus scientifique 615, rue du Jardin Botanique -BP 101 -54602 Villers-lès-Nancy Cedex (France) Unité de recherche INRIA Rennes : IRISA, Campus universitaire de Beaulieu -35042 Rennes Cedex (France) Unité de recherche INRIA Rhône-Alpes : 655, avenue de l'Europe -38334 Montbonnot Saint-Ismier (France) Unité de recherche INRIA Rocquencourt, Domaine de Voluceau -Rocquencourt -BP 105 -78153 Le Chesnay Cedex (France) Unité de recherche INRIA Sophia Antipolis : 2004, route des Lucioles -BP 93 -06902 Sophia Antipolis Cedex

I. De-voluceau-rocquencourt, BP 105 -78153 Le Chesnay Cedex (France) http://www.inria.fr ISSN, pp.249-6399