# Off-policy Learning with Eligibility Traces: A Survey

3 MAIA - Autonomous intelligent machine
Inria Nancy - Grand Est, LORIA - AIS - Department of Complex Systems, Artificial Intelligence & Robotics
Abstract : In the framework of Markov Decision Processes, we consider linear \emph{off-policy} learning, that is the problem of learning a linear approximation of the value function of some fixed policy from one trajectory possibly generated by some other policy. We briefly review \emph{on-policy} learning algorithms of the literature (gradient-based and least-squares-based), adopting a unified algorithmic view. Then, we highlight a systematic approach for adapting them to \emph{off-policy} learning \emph{with eligibility traces}. This leads to some known algorithms -- off-policy LSTD($\lambda$), LSPE($\lambda$), TD($\lambda$), TDC/GQ($\lambda$) -- and suggests new extensions -- off-policy FPKF($\lambda$), BRM($\lambda$), gBRM($\lambda$), GTD2($\lambda$). We describe a comprehensive algorithmic derivation of all algorithms in a recursive and memory-efficent form, discuss their known convergence properties and illustrate their relative empirical behavior on Garnet problems. Our experiments suggest that the most standard algorithms on and off-policy LSTD($\lambda$)/LSPE($\lambda$) -- and TD($\lambda$) if the feature space dimension is too large for a least-squares approach -- perform the best.
Type de document :
Article dans une revue
Journal of Machine Learning Research, Journal of Machine Learning Research, 2014, 15 (1), pp.289-333
Domaine :

https://hal.inria.fr/hal-00921275
Contributeur : Bruno Scherrer <>
Soumis le : vendredi 20 décembre 2013 - 10:35:19
Dernière modification le : jeudi 11 janvier 2018 - 06:25:23
Document(s) archivé(s) le : vendredi 21 mars 2014 - 10:15:29

### Fichiers

jmlr.pdf
Fichiers produits par l'(les) auteur(s)

### Identifiants

• HAL Id : hal-00921275, version 1

### Citation

Matthieu Geist, Bruno Scherrer. Off-policy Learning with Eligibility Traces: A Survey. Journal of Machine Learning Research, Journal of Machine Learning Research, 2014, 15 (1), pp.289-333. 〈hal-00921275〉

### Métriques

Consultations de la notice

## 387

Téléchargements de fichiers