sign in
english version rss feed

inria-00116936, version 2

A Unified View of TD Algorithms; Introducing Full-Gradient TD and Equi-Gradient Descent TD

Manuel Loth () a12, Philippe Preux () b12, Manuel Davy c13

European Symposium on Artificial Neural Networks (2007)

Abstract: This paper addresses the issue of policy evaluation in Markov Decision Processes, using linear function approximation. It provides a unified view of algorithms such as TD(lambda), LSTD(lambda), iLSTD, residual-gradient TD. It is asserted that they all consist in minimizing a gradient function and differ by the form of this function and their means of minimizing it. Two new schemes are introduced in that framework: Full-gradient TD which uses a generalization of the principle introduced in iLSTD, and EGD TD, which reduces the gradient by successive equi-gradient descents. These three algorithms form a new intermediate family with the interesting property of making much better use of the samples than TD while keeping a gradient descent scheme, which is useful for complexity issues and optimistic policy iteration.

  • a –  Université des Sciences et Technologie de Lille - Lille I
  • b –  Université Charles de Gaulle - Lille III
  • c –  Ecole Centrale de Lille
  • 1:  SEQUEL (INRIA Futurs)
  • INRIA – CNRS : UMR8022 – CNRS : UMR8146 – Université des Sciences et Technologies de Lille - Lille I – Université Charles de Gaulle - Lille III – Ecole Centrale de Lille
  • 2:  GRAPPA (LIFL)
  • CNRS : UMR8022 – Université Charles de Gaulle - Lille III – Université des Sciences et Technologies de Lille - Lille I
  • 3:  Laboratoire d'Automatique, Génie Informatique et Signal (LAGIS)
  • CNRS : UMR8146 – Université des Sciences et Technologies de Lille - Lille I – Ecole Centrale de Lille
  • Domain : Computer Science/Learning
  • Keywords : temporal difference reinforcement learning markov decision process
  • Available versions :  v1 (2006-11-29) v2 (2006-11-29)
 
  • inria-00116936, version 2
  • oai:hal.inria.fr:inria-00116936
  • From: 
  • Submitted on: Wednesday, 29 November 2006 10:12:47
  • Updated on: Monday, 28 May 2007 19:56:53
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...