A Unified View of TD Algorithms; Introducing Full-Gradient TD and Equi-Gradient Descent TD

Manuel Loth 1, 2 Philippe Preux 1, 2 Manuel Davy 1, 3
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
3 LAGIS-SI
LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : This paper addresses the issue of policy evaluation in Markov Decision Processes, using linear function approximation. It provides a unified view of algorithms such as TD(lambda), LSTD(lambda), iLSTD, residual-gradient TD. It is asserted that they all consist in minimizing a gradient function and differ by the form of this function and their means of minimizing it. Two new schemes are introduced in that framework: Full-gradient TD which uses a generalization of the principle introduced in iLSTD, and EGD TD, which reduces the gradient by successive equi-gradient descents. These three algorithms form a new intermediate family with the interesting property of making much better use of the samples than TD while keeping a gradient descent scheme, which is useful for complexity issues and optimistic policy iteration.
Type de document :
Communication dans un congrès
European Symposium on Artificial Neural Networks, Apr 2007, Bruges, Belgium, Belgium. 2007
Liste complète des métadonnées

Littérature citée [9 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00116936
Contributeur : Manuel Loth <>
Soumis le : mercredi 29 novembre 2006 - 10:12:47
Dernière modification le : jeudi 11 janvier 2018 - 06:26:40
Document(s) archivé(s) le : lundi 20 septembre 2010 - 16:41:52

Fichiers

unified.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Manuel Loth, Philippe Preux, Manuel Davy. A Unified View of TD Algorithms; Introducing Full-Gradient TD and Equi-Gradient Descent TD. European Symposium on Artificial Neural Networks, Apr 2007, Bruges, Belgium, Belgium. 2007. 〈inria-00116936v2〉

Partager

Métriques

Consultations de la notice

361

Téléchargements de fichiers

320