J. Baxter and P. Bartlett, Infinite-horizon policy-gradient estimation, Journal of Artificial Intelligence Research, vol.15, pp.319-350, 2001.

J. Baxter, P. Bartlett, and L. Weaver, Experiments with infinite-horizon, policy-gradient estimation, Journal of Artificial Intelligence Research, vol.15, pp.351-381, 2001.

O. Buffet, A. Dutech, C. , and F. , Adaptive combination of behaviors in an agent, Proceedings of the 15th European Conference on Artificial Intelligence (ECAI'02), 2002.
URL : https://hal.archives-ouvertes.fr/inria-00100766

O. Buffet, A. Dutech, C. , and F. , Automatic generation of an agent's basic behaviors, Proceedings of the second international joint conference on Autonomous agents and multiagent systems , AAMAS '03, 2003.
DOI : 10.1145/860575.860716

B. Digney, Learning hierarchical control structure for multiple tasks and changing environments, Proceedings of the Fifth Conference on the Simulation of Adaptive Behavior (SAB'98), 1998.

M. Humphrys, Action selection methods using reinforcement learning, From Animals to Animats 4: 4th International Conference on Simulation of Adaptive Behavior (SAB-96), 1996.

D. Joslin, A. Nunes, and M. E. Pollack, Tileworld users' manual, 1993.

L. Lin, Self-improving reactive agent based on reinforcement learning, planing and teaching, Machine Learning, pp.293-321, 1992.

R. A. Mccallum, Reinforcement Learning with Selective Perception and Hidden State, 1995.

U. Nehmzow, T. Smithers, and B. Mcgonigle, Increasing behavioural repertoire in a mobile robot, From Animals to Animats: Proceedings of the Second Conference on the Simulation of Adaptive Behavior (SAB'93), 1993.

J. Piaget, La Psychologie de l'Intelligence, 1967.

M. L. Puterman, Markov Decision Processes? Discrete Stochastic Dynamic Programming, 1994.

J. Randløv and P. Alstrøm, Learning to drive a bicycle using reinforcement learning and shaping, Proceedings of the 15th International Conference on Machine LearningICML-98), pp.463-471, 1998.

S. Singh, T. Jaakkola, J. , and M. , Learning Without State-Estimation in Partially Observable Markovian Decision Processes, Proceedings of the 11th International Conference on Machine Learning (ICML'94), 1994.
DOI : 10.1016/B978-1-55860-335-6.50042-8

R. Sutton, D. Precup, and S. Singh, Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales, 1998.

G. Tesauro, Practical issues in temporal difference learning, Machine Learning, pp.257-277, 1992.

T. Tyrrell, Computational Mechanisms for Action Selection, 1993.

J. Urzelai, D. Floreano, M. Dorigo, and M. Colombetti, Incremental Robot Shaping, Connection Science, vol.10, issue.3-4, 1998.
DOI : 10.1080/095400998116486

C. Watkins, Learning from delayed rewards. PhD thesis, King's College of Cambridge, 1989.

J. Weng, A theory of mentally developing robots, Proceedings of the 2nd International Conference on Development and Learning (ICDL'02), 2002.

M. Wooldridge, J. Müller, and M. Tambe, Agent theories, architectures, and languages: A bibliography, Intelligent Agents II, IJCAI'95 Workshop, pp.408-431, 1995.
DOI : 10.1007/3540608052_81