. Combinaison-directe-de, Un poids par configuration et action ? correction 2. (+) Un poids par configuration 5. ( * ) Un poids par configuration 3. (+) Un poids par configuration et action 6. ( * ) Un poids par configuration et action ? correction ? la réutilisabilité des paramètres ? appris : ils peuvent efficacement servir de base dans des situations plus complexes que celles auxquelles ils étaient dédiés

J. Baxter and P. Bartlett, « Infinite-Horizon Policy-Gradient Estimation, Journal of Artificial Intelligence Research, vol.15, pp.319-350, 2001.

J. Baxter, P. Bartlett, and L. Weaver, « Experiments with Infinite-Horizon, Policy-Gradient Estimation, Journal of Artificial Intelligence Research, vol.15, pp.351-381, 2001.

D. Bertsekas, Dynamic Programming : Deterministic and Stochastic Models, 1987.

R. Brooks, Robot that Walks ; Emergent Behavior from a Carefully Evolved Network, Neural Computation, vol.1, pp.2-2, 1989.

O. Buffet, Une double approche modulaire de l'apprentissage par renforcement pour des agents intelligents adaptatifs, Laboratoire Lorrain de recherche en informatique et ses applications (LORIA), 2003.
URL : https://hal.archives-ouvertes.fr/tel-00509349

C. A. Kaelbling, L. P. Littman, and M. L. , « Acting Optimally in Partially Observable Stochastic Domains, Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI'94), pp.1023-1028, 1994.

I. Chadès, Planification Distribuée dans les Systèmes Multi-agents à l'aide de Processus Décisionnels de Markov, 2003.

T. Dietterich, Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition, Journal of Artificial Intelligence Research, vol.13, pp.227-303, 2000.

A. Dutech, Apprentissage d'environnement : approches cognitives et comportementales, ENSAE, 1999.

A. Dutech, O. Buffet, and F. Charpillet, « Développement autonome des comportements de base d'un agent, Actes de la Conférence d'Apprentissage (CAp'04), 2004.

A. Dutech and M. Samuelides, Apprentissage par renforcement pour les processus d??cisionnels de Markov partiellement observ??s Apprendre une extension s??lective du pass??, Revue d'intelligence artificielle, vol.17, issue.4, pp.559-589, 2003.
DOI : 10.3166/ria.17.559-589

C. Guestrin, D. Koller, R. Parr, and S. Venkataraman, « Efficient Solution Algorithms for Factored MDPs, Journal of Artificial Intelligence Research, vol.19, pp.399-468, 2003.

B. Hengst, « Discovering Hierarchy in Reinforcement Learning with HEXQ, Proceedings of the Nineteenth International Conference on Machine Learning (ICML'02), pp.243-250, 2002.

M. Humphrys, Action Selection methods using Reinforcement Learning, 1997.

P. V. Laarhoven, Simulated annealing : theory and applications, 1987.
DOI : 10.1007/978-94-015-7744-1

G. Laurent, Synthèse de comportements par apprentissages par renforcement parallèles, 2002.

L. Lin, « Hierarchical Learning of Robot Skills, Proceedings of the IEEE International Conference on Neural Networks (ICNN'93), 1993.

P. Maes and . Bottom, Up Mechanism for Behaviour Selection in an Artificial Creature, From Animals to Animats 1 : Proceedings of the First International Conference on Simulation of Adaptive Behavior (SAB'91), 1991.

S. Mahadevan and J. Connell, Automatic programming of behavior-based robots using reinforcement learning, Artificial Intelligence, vol.55, issue.2-3, pp.2-2, 1992.
DOI : 10.1016/0004-3702(92)90058-6

M. Matari´cmatari´c, Reinforcement Learning in the Multi-Robot Domain, Autonomous Robots, vol.4, issue.1, pp.73-83, 1997.
DOI : 10.1023/A:1008819414322

G. Monahan, State of the Art???A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms, Management Science, vol.28, issue.1, pp.1-16, 1982.
DOI : 10.1287/mnsc.28.1.1

S. Singh and D. Cohn, « How to Dynamically Merge Markov Decision Processes, Advances in Neural Information Processing Systems 10 (NIPS'98, 1998.

R. Smallwood and E. Sondik, The Optimal Control of Partially Observable Markov Processes over a Finite Horizon, Operations Research, vol.21, issue.5, pp.1071-1088, 1973.
DOI : 10.1287/opre.21.5.1071

R. Sutton and G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

R. Sutton, D. Precup, and S. Singh, Between MDPs and Semi-MDPs : Learning, planning, and representing knowledge at multiple temporal scales, pp.342-362, 1998.

T. Tyrrell, Computational Mechanisms for Action Selection, 1993.

C. Watkins, Learning from delayed rewards, PhD thesis, King's College of Cambridge, 1989.

M. Wooldridge, J. Müller, and M. Tambe, Agent theories, architectures, and languages: A bibliography, Intelligent Agents II, IJCAI'95 Workshop, pp.408-431, 1995.
DOI : 10.1007/3540608052_81