login
english version rss feed
inria-00116936, version 2
See detailed view  BibTeX  EndNote  TEI  RefWorks
A Unified View of TD Algorithms; Introducing Full-Gradient TD and Equi-Gradient Descent TD
Manuel Loth () a12, Philippe Preux () b12, Manuel Davy c13
(2006)
Icone de unified.ps
Icone de unified.pdf
European Symposium on Artificial Neural Networks (2007)
This paper addresses the issue of policy evaluation in Markov Decision Processes, using linear function approximation. It provides a unified view of algorithms such as TD(lambda), LSTD(lambda), iLSTD, residual-gradient TD. It is asserted that they all consist in minimizing a gradient function and differ by the form of this function and their means of minimizing it. Two new schemes are introduced in that framework: Full-gradient TD which uses a generalization of the principle introduced in iLSTD, and EGD TD, which reduces the gradient by successive equi-gradient descents. These three algorithms form a new intermediate family with the interesting property of making much better use of the samples than TD while keeping a gradient descent scheme, which is useful for complexity issues and optimistic policy iteration.
a –  Université des Sciences et Technologie de Lille - Lille I
b –  Université Charles de Gaulle - Lille III
c –  Ecole Centrale de Lille
1:  SEQUEL (INRIA Futurs)
INRIA – CNRS : UMR8022 – CNRS : UMR8146 – Université des Sciences et Technologies de Lille - Lille I – Université Charles de Gaulle - Lille III – Ecole Centrale de Lille
2:  Grappa - LIFL (GRAPPA)
CNRS : UMR8022 – Université Charles de Gaulle - Lille III
3:  Laboratoire d'Automatique, Génie Informatique et Signal (LAGIS)
CNRS : UMR8146 – Université des Sciences et Technologies de Lille - Lille I – Ecole Centrale de Lille
Computer Science/Learning
temporal difference reinforcement learning markov decision process