On the rate of convergence and error bounds for LSTD(λ)

Manel Tagorti 1, * Bruno Scherrer 2, 3
* Auteur correspondant
1 MAIA - Autonomous intelligent machine
Inria Nancy - Grand Est, LORIA - AIS - Department of Complex Systems, Artificial Intelligence & Robotics
2 BIGS - Biology, genetics and statistics
Inria Nancy - Grand Est, IECL - Institut Élie Cartan de Lorraine
Abstract : We consider LSTD(λ), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a β-mixing assumption , we derive, for any value of λ ∈ (0, 1), a high-probability bound on the rate of convergence of this algorithm to its limit. We deduce a high-probability bound on the error of this algorithm, that extends (and slightly improves) that derived by Lazaric et al. (2012) in the specific case where λ = 0. In the context of temporal-difference algorithms with value function approximation, this analysis is to our knowledge the first to provide insight on the choice of the eligibility-trace parameter λ with respect to the approximation quality of the space and the number of samples.
Type de document :
Communication dans un congrès
ICML 2015, Jul 2015, Lille, France. 2015
Liste complète des métadonnées

https://hal.inria.fr/hal-01186667
Contributeur : Bruno Scherrer <>
Soumis le : mardi 25 août 2015 - 14:17:16
Dernière modification le : mardi 16 octobre 2018 - 16:35:59
Document(s) archivé(s) le : jeudi 26 novembre 2015 - 14:01:34

Fichiers

lstd.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01186667, version 1

Citation

Manel Tagorti, Bruno Scherrer. On the rate of convergence and error bounds for LSTD(λ). ICML 2015, Jul 2015, Lille, France. 2015. 〈hal-01186667〉

Partager

Métriques

Consultations de la notice

278

Téléchargements de fichiers

130