Off-policy Learning with Eligibility Traces: A Survey

Matthieu Geist; Bruno Scherrer

Article Dans Une Revue Journal of Machine Learning Research Année : 2014

Off-policy Learning with Eligibility Traces: A Survey

(1, 2) , (3)

1
2
3

Matthieu Geist

Fonction : Auteur
PersonId : 6945
IdHAL : matthieu-geist

SUPELEC-Campus Metz

Georgia Tech Lorraine [Metz]

Bruno Scherrer

Fonction : Auteur
PersonId : 1406
IdHAL : bruno-scherrer
IdRef : 073360708

Autonomous intelligent machine

Résumé

In the framework of Markov Decision Processes, we consider linear \emph{off-policy} learning, that is the problem of learning a linear approximation of the value function of some fixed policy from one trajectory possibly generated by some other policy. We briefly review \emph{on-policy} learning algorithms of the literature (gradient-based and least-squares-based), adopting a unified algorithmic view. Then, we highlight a systematic approach for adapting them to \emph{off-policy} learning \emph{with eligibility traces}. This leads to some known algorithms -- off-policy LSTD($\lambda$), LSPE($\lambda$), TD($\lambda$), TDC/GQ($\lambda$) -- and suggests new extensions -- off-policy FPKF($\lambda$), BRM($\lambda$), gBRM($\lambda$), GTD2($\lambda$). We describe a comprehensive algorithmic derivation of all algorithms in a recursive and memory-efficent form, discuss their known convergence properties and illustrate their relative empirical behavior on Garnet problems. Our experiments suggest that the most standard algorithms on and off-policy LSTD($\lambda$)/LSPE($\lambda$) -- and TD($\lambda$) if the feature space dimension is too large for a least-squares approach -- perform the best.

Domaines

Apprentissage [cs.LG] Optimisation et contrôle [math.OC]

Fichier principal

jmlr.pdf (549.52 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Bruno Scherrer : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00921275

Soumis le : vendredi 20 décembre 2013-10:35:19

Dernière modification le : mardi 16 avril 2024-16:01:27

Archivage à long terme le : vendredi 21 mars 2014-10:15:29

Dates et versions

hal-00921275 , version 1 (20-12-2013)

Identifiants

HAL Id : hal-00921275 , version 1

Citer

Matthieu Geist, Bruno Scherrer. Off-policy Learning with Eligibility Traces: A Survey. Journal of Machine Learning Research, 2014, 15 (1), pp.289-333. ⟨hal-00921275⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

SUPELEC CNRS INRIA UNIV-FCOMTE SUP_IMS UMI-GTL UNIV-LORRAINE INRIA2 TDS-MACS LORIA LORIA-AIS

5835 Consultations

165 Téléchargements

Off-policy Learning with Eligibility Traces: A Survey

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager