M. Shalden and W. Newsome, The variable discharge of cortical neurons: Implocations for connectivity, computation and information coding, Journal of Neuroscience, vol.18, issue.10, pp.3870-3896, 1998.

W. Softky and C. Koch, The high irregular firing of cortical cells is inconsistent with temporal integration of random EPSPs, Journal of Neuroscience, vol.13, issue.1, pp.334-450, 1993.

C. Harris and D. Wolpert, Signal dependant noise determines motor planning, Nature, vol.394, issue.6695, pp.780-784, 1998.
DOI : 10.1038/29528

R. Sutton and A. Barto, Reinforcement Learning, 1998.
DOI : 10.1007/978-1-4615-3618-5

URL : https://hal.archives-ouvertes.fr/hal-00764281

R. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, pp.229-256, 1992.

Y. Takahashi, Y. Schoenbaum, and . Niv, Silencing the critics: understanding the effects of cocaine sensitization on dorsolateral and ventral striatum in the context of an actor/critic model, Frontiers in Neuroscience, vol.2, issue.1, pp.86-99, 2008.
DOI : 10.3389/neuro.01.014.2008

K. Doya, Reinforcement Learning in Continuous Time and Space, Neural Computation, vol.3, issue.1, pp.219-245, 2000.
DOI : 10.1109/9.580874

J. Peters and S. Schaal, Natural Actor-Critic, Neurocomputing, vol.71, issue.7-9, pp.1180-1190, 2008.
DOI : 10.1016/j.neucom.2007.11.026

J. Baxter and P. Bartlett, Infinite-horizon policy-gradient estimation, Journal of Artificial Intelligence Research, vol.15, pp.319-350, 2001.

J. Baxter, P. Bartlett, and L. Weaver, Experiments with infinite-horizon, policy-gradient estimation, Journal of Artificial Intelligence Research, vol.15, pp.351-381, 2001.

S. Amari, Dynamics of pattern formation in lateral-inhibition type neural fields, Biological Cybernetics, vol.13, issue.2, pp.77-87, 1977.
DOI : 10.1007/BF00337259

G. Schöner, M. Dose, and C. Engels, Dynamics of behavior: Theory and applications for autonomous robot architectures, Robotics and Autonomous Systems, vol.16, issue.2-4, pp.213-245, 1995.
DOI : 10.1016/0921-8890(95)00049-6

S. Funahashi, C. J. Bruce, and P. S. Goldman-rakic, Mnemonic coding of visual space in the monkey's dorsolateral prefrontal cortex, J. Neurophysiol, vol.61, pp.331-349, 1989.

R. Ben-yishai, R. L. Bar-or, and H. Sompolinsky, Theory of orientation tuning in visual cortex., Proc. Nat. Acad. Sci. USA, pp.3844-3848, 1995.
DOI : 10.1073/pnas.92.9.3844

E. Todorov and M. I. Jordan, Optimal feedback control as a theory of motor coordination, Nature Neuroscience, vol.5, issue.11, pp.1226-1235, 2002.
DOI : 10.1038/nn963

P. Viviani and R. Schneider, A developmental study of the relationship between geometry and kinematics in drawing movements., Journal of Experimental Psychology: Human Perception and Performance, vol.17, issue.1, pp.198-218, 1991.
DOI : 10.1037/0096-1523.17.1.198

T. Flash and N. Hogan, The coordination of arm movements : An experimentally confirmed mathematical model, The Journal of Neuroscience, vol.5, issue.7, pp.1688-1703, 1985.

J. A. Tropp, Greed is good: algorithmic results for sparse approximation. Information Theory, IEEE Transactions on, vol.50, issue.10, pp.2231-2242, 2004.

X. Xie and H. S. Seung, Learning in neural networks by reinforcement of irregular spiking, Physical Review E, vol.69, issue.4, 2004.
DOI : 10.1103/PhysRevE.69.041909

D. Baras and R. Meir, Reinforcement Learning, Spike-Time-Dependent Plasticity, and the BCM Rule, Neural Computation, vol.17, issue.1, pp.2245-2279, 2007.
DOI : 10.1103/PhysRevE.69.041909

W. Gerstner and W. Kistler, Spiking Neuron Models Single Neurons, Populations, Plasticity, 2002.

V. Razvan and . Florian, A reinforcement learning algorithm for spiking neural networks, Seventh International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC'05), pp.299-306, 2005.

E. Daucé, A model of cell specialization using a hebbian policy-gradient approach with " slow " noise, Proceedings, Part I the 19th International conference on artificial neural networks, pp.218-228, 2009.