Rate of Convergence and Error Bounds for LSTD($\lambda$)

Manel Tagorti 1 Bruno Scherrer 1
1 MAIA - Autonomous intelligent machine
Inria Nancy - Grand Est, LORIA - AIS - Department of Complex Systems, Artificial Intelligence & Robotics
Abstract : We consider LSTD($\lambda$), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a $\beta$-mixing assumption, we derive, for any value of $\lambda \in (0,1)$, a high-probability estimate of the rate of convergence of this algorithm to its limit. We deduce a high-probability bound on the error of this algorithm, that extends (and slightly improves) that derived by Lazaric et al. (2012) in the specific case where $\lambda=0$. In particular, our analysis sheds some light on the choice of $\lambda$ with respect to the quality of the chosen linear space and the number of samples, that complies with simulations.
Liste complète des métadonnées

Littérature citée [12 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00990525
Contributeur : Bruno Scherrer <>
Soumis le : mardi 13 mai 2014 - 15:49:54
Dernière modification le : jeudi 11 janvier 2018 - 06:25:23
Document(s) archivé(s) le : lundi 10 avril 2017 - 22:16:10

Fichiers

report.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00990525, version 1
  • ARXIV : 1405.3229

Citation

Manel Tagorti, Bruno Scherrer. Rate of Convergence and Error Bounds for LSTD($\lambda$). [Research Report] 2014. 〈hal-00990525〉

Partager

Métriques

Consultations de la notice

324

Téléchargements de fichiers

345