J. Baxter and P. Bartlett, « Infinite-Horizon Policy-Gradient Estimation, Journal of Artificial Intelligence Research, vol.15, pp.319-350, 2001.

J. Baxter, P. Bartlett, and L. Weaver, « Experiments with Infinite-Horizon, Policy-Gradient Estimation, Journal of Artificial Intelligence Research, vol.15, pp.351-381, 2001.

C. Boutilier, R. Reiter, and B. Price, « Symbolic Dynamic Programming for First-order MDPs, Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI'01, pp.690-697, 2001.

O. Buffet, Une double approche modulaire de l'apprentissage par renforcement pour des agents intelligents adaptatifs, Laboratoire Lorrain de recherche en informatique et ses applications (LORIA), 2003.
URL : https://hal.archives-ouvertes.fr/tel-00509349

O. Buffet, A. Dutech, and F. Charpillet, « Adaptive Combination of Behaviors in an Agent, Proceedings of the 15th European Conference on Artificial Intelligence (ECAI'02), 2002.

O. Buffet, A. Dutech, and F. Charpillet, Automatic generation of an agent's basic behaviors, Proceedings of the second international joint conference on Autonomous agents and multiagent systems , AAMAS '03, 2003.
DOI : 10.1145/860575.860716

T. Dietterich, Reinforcement Learning with the MAXQ Value Function Decomposition, Journal of Artificial Intelligence Research, vol.13, pp.227-303, 2000.

B. Digney, Learning Hierarchical Control Structure for Multiple Tasks and Changing Environments, Proceedings of the Fifth Conference on the Simulation of Adaptive Behavior (SAB'98, 1998.

A. Dutech, O. Buffet, F. Charpillet, and . Multi, Agent Systems by Incremental Gradient Reinforcement Learning, Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI'01), 2001.
URL : https://hal.archives-ouvertes.fr/inria-00101090

C. Genest and J. Zidek, Combining Probability Distributions: A Critique and an Annotated Bibliography, Statistical Science, vol.1, issue.1, pp.114-135, 1986.
DOI : 10.1214/ss/1177013825

C. Gretton and S. Thiébaux, Exploiting First-Order Regression in Inductive Policy Selection, Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI'04), 2004.

M. Hauskretch, N. Meuleau, L. Kaelbling, T. Dean, and C. Boutilier, « Hierarchical Solution of Markov Decision Processes Using Macro-Actions, Proceedings of the Fourteenth International Conference on Uncertainty in Artificial Intelligence (UAI'98), pp.220-229, 1998.

B. Hengst, « Discovering Hierarchy in Reinforcement Learning with HEXQ, Proceedings of the Nineteenth International Conference on Machine Learning (ICML'02), pp.243-250, 2002.

M. Humphrys, Action Selection methods using Reinforcement Learning, 4th International Conference on Simulation of Adaptive Behavior (SAB-96), 1996.

T. Jaakkola, M. Jordan, and S. Singh, On the Convergence of Stochastic Iterative Dynamic Programming Algorithms, Neural Computation, vol.8, issue.6, pp.1186-1201, 1994.
DOI : 10.1214/aoms/1177729586

L. Kaelbling, Hierarchical Learning in Stochastic Domains: Preliminary Results, Proceedings of the Tenth International Conference on Machine Learning (ICML'93), 1993.
DOI : 10.1016/B978-1-55860-307-3.50028-9

M. Kearns, Y. Mansour, and A. Ng, « A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes, Machine Learning, pp.193-208, 2002.

. L. Lin, « Self-improving reactive agent based on reinforcement learning, planning and teaching, Machine Learning, pp.293-321, 1992.

L. Lin, « Hierarchical Learning of Robot Skills, Proceedings of the IEEE International Conference on Neural Networks (ICNN'93), 1993.

M. Littman, A. Cassandra, and L. Kaelbling, Learning policies for partially observable environments: Scaling up, Proceedings of the 12th International Conference on Machine Learning (ICML'95), 1995.
DOI : 10.1016/B978-1-55860-377-6.50052-9

D. Mackay, Information Theory, Inference, and Learning Algorithms, 2003.

S. Mahadevan and J. Connell, Automatic programming of behavior-based robots using reinforcement learning, Artificial Intelligence, vol.55, issue.2-3, pp.311-365, 1992.
DOI : 10.1016/0004-3702(92)90058-6

R. A. Mccallum, Reinforcement Learning with Selective Perception and Hidden State, 1995.

U. Nehmzow, T. Smithers, and B. Mcgonigle, « Increasing Behavioural Repertoire in a Mobile Robot, From Animals to Animats : Proceedings of the Second Conference on the Simulation of Adaptive Behavior (SAB'93), 1993.

R. E. Parr, Hierarchical Control and Learning for Markov Decision Processes, 1998.

L. Peret and F. Garcia, « On-line search for solving Markov Decision Processes via heuristic sampling, Proceedings of the 16th European Conference on Artificial Intelligence (ECAI'2004), 2004.

J. Piaget, La Psychologie de l'Intelligence, 1967.

S. Russell and P. Norvig, Artificial Intelligence : A Modern Approach, 1995.

Y. Shoham, R. Powers, and T. Grenager, Multi-agent reinforcement learning : a critical survey, 2003.

R. Sutton and G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

T. Tyrrell, Computational Mechanisms for Action Selection, 1993.

C. Watkins, Learning from delayed rewards, PhD thesis, King's College of Cambridge, 1989.

J. Weng, Theory of Mentally Developing Robots, Proceedings of the 2nd International Conference on Development and Learning (ICDL'02), june, 2002.