Off-policy Learning with Eligibility Traces: A Survey

Matthieu Geist 1 Bruno Scherrer 2
2 MAIA - Autonomous intelligent machine
Inria Nancy - Grand Est, LORIA - AIS - Department of Complex Systems, Artificial Intelligence & Robotics
Abstract : In the framework of Markov Decision Processes, off-policy learning, that is the problem of learning a linear approximation of the value function of some fixed policy from one trajectory possibly generated by some other policy. We briefly review on-policy learning algorithms of the literature (gradient-based and least-squares-based), adopting a unified algorithmic view. Then, we highlight a systematic approach for adapting them to off-policy learning with eligibility traces. This leads to some known algorithms - off-policy LSTD(λ), LSPE(λ), TD(λ), TDC/GQ(λ) - and suggests new extensions - off-policy FPKF(λ), BRM(λ), gBRM(λ), GTD2(λ). We describe a comprehensive algorithmic derivation of all algorithms in a recursive and memory-efficent form, discuss their known convergence properties and illustrate their relative empirical behavior on Garnet problems. Our experiments suggest that the most standard algorithms on and off-policy LSTD(λ)/LSPE(λ) - and TD(λ) if the feature space dimension is too large for a least-squares approach - perform the best.
Type de document :
[Research Report] 2013, pp.43
Liste complète des métadonnées

Littérature citée [29 références]  Voir  Masquer  Télécharger
Contributeur : Bruno Scherrer <>
Soumis le : vendredi 12 avril 2013 - 13:12:35
Dernière modification le : mardi 18 décembre 2018 - 16:40:21
Document(s) archivé(s) le : lundi 3 avril 2017 - 04:24:16


Fichiers produits par l'(les) auteur(s)


  • HAL Id : hal-00644516, version 2
  • ARXIV : 1304.3999


Matthieu Geist, Bruno Scherrer. Off-policy Learning with Eligibility Traces: A Survey. [Research Report] 2013, pp.43. 〈hal-00644516v2〉



Consultations de la notice


Téléchargements de fichiers