inria-00116936, version 2
A Unified View of TD Algorithms; Introducing Full-Gradient TD and Equi-Gradient Descent TD
Manuel Loth
a, 1, 2Philippe Preux
b, 1, 2Manuel Davy c, 1, 3
European Symposium on Artificial Neural Networks (2007)
Résumé : This paper addresses the issue of policy evaluation in Markov Decision Processes, using linear function approximation. It provides a unified view of algorithms such as TD(lambda), LSTD(lambda), iLSTD, residual-gradient TD. It is asserted that they all consist in minimizing a gradient function and differ by the form of this function and their means of minimizing it. Two new schemes are introduced in that framework: Full-gradient TD which uses a generalization of the principle introduced in iLSTD, and EGD TD, which reduces the gradient by successive equi-gradient descents. These three algorithms form a new intermediate family with the interesting property of making much better use of the samples than TD while keeping a gradient descent scheme, which is useful for complexity issues and optimistic policy iteration.
- a – Université des Sciences et Technologie de Lille - Lille I
- b – Université Charles de Gaulle - Lille III
- c – Ecole Centrale de Lille
- 1 : SEQUEL (INRIA Futurs)
- INRIA – CNRS : UMR8022 – CNRS : UMR8146 – Université Lille 1 - Sciences et Technologies – Université Charles de Gaulle - Lille III – Ecole Centrale de Lille
- 2 : GRAPPA (LIFL)
- CNRS : UMR8022 – Université Charles de Gaulle - Lille III – Université Lille 1 - Sciences et Technologies
- 3 : Laboratoire d'Automatique, Génie Informatique et Signal (LAGIS)
- CNRS : UMR8146 – Université Lille 1 - Sciences et Technologies – Ecole Centrale de Lille
- Domaine : Informatique/Apprentissage
- Mots-clés : temporal difference reinforcement learning markov decision process
- Versions disponibles : v1 (29-11-2006) v2 (29-11-2006)
- inria-00116936, version 2
- http://hal.inria.fr/inria-00116936
- oai:hal.inria.fr:inria-00116936
- Contributeur : Manuel Loth
- Soumis le : Mercredi 29 Novembre 2006, 10:12:47
- Dernière modification le : Mercredi 22 Février 2012, 10:06:31






Documents associés

Exporter