Recursive Least-Squares Off-policy Learning with Eligibility Traces - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Rapport (Rapport De Recherche) Année : 2011

Recursive Least-Squares Off-policy Learning with Eligibility Traces

Bruno Scherrer
Matthieu Geist

Résumé

In the framework of Markov Decision Processes, we consider the problem of learning a linear approximation of the value function of some fixed policy from one trajectory possibly generated by some other policy. We describe a systematic approach for adapting on-policy learning least squares algorithms of the literature (LSTD, LSPE, FPKF and GPTD/KTD) to off-policy learning with eligibility traces. This leads to two known algorithms, LSTD($\lambda$)/LSPE($\lambda$) and suggests new extensions of FPKF and GPTD/KTD. We describe their recursive implementation, discuss their convergence properties, and illustrate their behavior experimentally. Overall, our study suggests that the state-of-art LSTD($\lambda$) remains the best least-squares algorithm.
Fichier principal
Vignette du fichier
reclstd.pdf (562.07 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-00644516 , version 1 (24-11-2011)
hal-00644516 , version 2 (12-04-2013)

Identifiants

  • HAL Id : hal-00644516 , version 1

Citer

Bruno Scherrer, Matthieu Geist. Recursive Least-Squares Off-policy Learning with Eligibility Traces. [Research Report] 2011, pp.29. ⟨hal-00644516v1⟩
414 Consultations
420 Téléchargements

Partager

Gmail Facebook X LinkedIn More