M. Luko?evi?ius and H. Jaeger, Reservoir computing approaches to recurrent neural network training, Computer Science Review, vol.3, issue.3, pp.127-149, 2009.
DOI : 10.1016/j.cosrev.2009.03.005

M. Lagoudakis, R. Parr, and M. Littman, Least-squares methods in reinforcement learning for control), ser, Proc; of the 2nd Hellenic Conference on Artificial Intelligence, pp.2308-249, 2002.

M. Lungarella, G. Metta, R. Pfeifer, and G. Sandini, Developmental robotics: a survey, Connection Science, vol.1, issue.4, pp.151-190, 2003.
DOI : 10.2307/1131322
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.201.7764

C. Watkins, Learning from delayed rewards Ph.D. dissertation, King's College of Cambridge, 1989.

N. P. Rougier and Y. Boniface, Dynamic self-organising map, Neurocomputing, vol.74, issue.11, pp.1840-1847, 2011.
DOI : 10.1016/j.neucom.2010.06.034
URL : https://hal.archives-ouvertes.fr/inria-00495827

L. Sarzyniec, O. Buffet, and A. Dutech, Apprentissage par renforcement développemental en robotique autonome, Conférence Francophone d'Apprentissage, 2011.

J. Legrand, Apprentissage par renforcement développemental

T. Kohonen, Self-organized formation of topologically correct feature maps, Biological Cybernetics, vol.13, issue.1, pp.59-69, 1982.
DOI : 10.1007/BF00337288

D. Bertsekas and J. Tsitsiklis, Neuro-dynamic programming, Athena Scientific, 1996.

C. Szepesvári, Algorithms for Reinforcement Learning (Synthesis Lectures on Artificial Intelligence and Machine Learning) Morgan and Claypool, 2010.