Recursive Least-Squares Off-policy Learning with Eligibility Traces

Bruno Scherrer; Matthieu Geist

Rapport (Rapport De Recherche) Année : 2011

Recursive Least-Squares Off-policy Learning with Eligibility Traces

(1) , (2)

1
2

Bruno Scherrer

Fonction : Auteur
PersonId : 1406
IdHAL : bruno-scherrer
IdRef : 073360708

Autonomous intelligent machine

Matthieu Geist

Fonction : Auteur
PersonId : 6945
IdHAL : matthieu-geist

SUPELEC-Campus Metz

Résumé

In the framework of Markov Decision Processes, we consider the problem of learning a linear approximation of the value function of some fixed policy from one trajectory possibly generated by some other policy. We describe a systematic approach for adapting on-policy learning least squares algorithms of the literature (LSTD, LSPE, FPKF and GPTD/KTD) to off-policy learning with eligibility traces. This leads to two known algorithms, LSTD($\lambda$)/LSPE($\lambda$) and suggests new extensions of FPKF and GPTD/KTD. We describe their recursive implementation, discuss their convergence properties, and illustrate their behavior experimentally. Overall, our study suggests that the state-of-art LSTD($\lambda$) remains the best least-squares algorithm.

Mots clés

Reinforcement Learning Value Estimation Linear Least-Squares Algorithms Convergence Analysis

Domaines

Intelligence artificielle [cs.AI] Recherche opérationnelle [math.OC]

Fichier principal

reclstd.pdf (562.07 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Bruno Scherrer : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00644516

Soumis le : jeudi 24 novembre 2011-15:28:44

Dernière modification le : vendredi 24 mars 2023-14:52:55

Archivage à long terme le : samedi 25 février 2012-02:26:43

Dates et versions

hal-00644516 , version 1 (24-11-2011)

hal-00644516 , version 2 (12-04-2013)

Identifiants

HAL Id : hal-00644516 , version 1

Citer

Bruno Scherrer, Matthieu Geist. Recursive Least-Squares Off-policy Learning with Eligibility Traces. [Research Report] 2011, pp.29. ⟨hal-00644516v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

414 Consultations

420 Téléchargements

Recursive Least-Squares Off-policy Learning with Eligibility Traces

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Partager