| inria-00116936, version 2 |
| arXiv:cs.LG/0611145 |
|
|
| See detailed view | BibTeX EndNote TEI RefWorks |
|
|
|||||||||
| European Symposium on Artificial Neural Networks (2007) |
| This paper addresses the issue of policy evaluation in Markov Decision Processes, using linear function approximation. It provides a unified view of algorithms such as TD(lambda), LSTD(lambda), iLSTD, residual-gradient TD. It is asserted that they all consist in minimizing a gradient function and differ by the form of this function and their means of minimizing it. Two new schemes are introduced in that framework: Full-gradient TD which uses a generalization of the principle introduced in iLSTD, and EGD TD, which reduces the gradient by successive equi-gradient descents. These three algorithms form a new intermediate family with the interesting property of making much better use of the samples than TD while keeping a gradient descent scheme, which is useful for complexity issues and optimistic policy iteration. |
|
|
|
|
|
|
|
|
| a – | |
| b – | |
| c – | |
| 1: | SEQUEL (INRIA Futurs) |
| INRIA – CNRS : UMR8022 – CNRS : UMR8146 – Université des Sciences et Technologies de Lille - Lille I – Université Charles de Gaulle - Lille III – Ecole Centrale de Lille | |
| 2: | Grappa - LIFL (GRAPPA) |
| CNRS : UMR8022 – Université Charles de Gaulle - Lille III | |
| 3: | Laboratoire d'Automatique, Génie Informatique et Signal (LAGIS) |
| CNRS : UMR8146 – Université des Sciences et Technologies de Lille - Lille I – Ecole Centrale de Lille |
|
|
|
|
|
|
|
|
| Domain | : | Computer Science/Learning |
| temporal difference reinforcement learning markov decision process |
| Available versions: | v1 (2006-11-29) | v2 (2006-11-29) |
| inria-00116936, version 2 | |
| http://hal.inria.fr/inria-00116936/en/ | |
| oai:hal.inria.fr:inria-00116936_v2 | |
| From: Manuel Loth | |
| Submitted on: Wednesday, 29 November 2006 10:12:47 | |
| Updated on: Monday, 28 May 2007 19:56:53 | |