Off-policy Learning with Eligibility Traces: A Survey

Matthieu Geist; Bruno Scherrer

Rapport (Rapport De Recherche) Année : 2013

Off-policy Learning with Eligibility Traces: A Survey

(1) , (2)

1
2

Matthieu Geist

Fonction : Auteur
PersonId : 6945
IdHAL : matthieu-geist

SUPELEC-Campus Metz

Bruno Scherrer

Fonction : Auteur
PersonId : 1406
IdHAL : bruno-scherrer
IdRef : 073360708

Autonomous intelligent machine

Résumé

In the framework of Markov Decision Processes, off-policy learning, that is the problem of learning a linear approximation of the value function of some fixed policy from one trajectory possibly generated by some other policy. We briefly review on-policy learning algorithms of the literature (gradient-based and least-squares-based), adopting a unified algorithmic view. Then, we highlight a systematic approach for adapting them to off-policy learning with eligibility traces. This leads to some known algorithms - off-policy LSTD(λ), LSPE(λ), TD(λ), TDC/GQ(λ) - and suggests new extensions - off-policy FPKF(λ), BRM(λ), gBRM(λ), GTD2(λ). We describe a comprehensive algorithmic derivation of all algorithms in a recursive and memory-efficent form, discuss their known convergence properties and illustrate their relative empirical behavior on Garnet problems. Our experiments suggest that the most standard algorithms on and off-policy LSTD(λ)/LSPE(λ) - and TD(λ) if the feature space dimension is too large for a least-squares approach - perform the best.

Mots clés

Reinforcement Learning Value Estimation Off-policy Learning Eligibility Traces

Domaines

Intelligence artificielle [cs.AI] Recherche opérationnelle [math.OC]

Fichier principal

jmlr.pdf (1022.28 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Bruno Scherrer : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00644516

Soumis le : vendredi 12 avril 2013-13:12:35

Dernière modification le : jeudi 1 février 2024-10:04:25

Archivage à long terme le : lundi 3 avril 2017-04:24:16

Dates et versions

hal-00644516 , version 1 (24-11-2011)

hal-00644516 , version 2 (12-04-2013)

Identifiants

HAL Id : hal-00644516 , version 2
ARXIV : 1304.3999

Citer

Matthieu Geist, Bruno Scherrer. Off-policy Learning with Eligibility Traces: A Survey. [Research Report] 2013, pp.43. ⟨hal-00644516v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

SUPELEC UNIV-RENNES1 CNRS INRIA IRISA SUP_IMS UNIV-LORRAINE INRIA2 LORIA LORIA-AIS LARA UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

414 Consultations

420 Téléchargements

Off-policy Learning with Eligibility Traces: A Survey

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager