On the Rate of Convergence and Error Bounds for LSTD(λ)

Manel Tagorti 1, * Bruno Scherrer 2, 3
* Auteur correspondant
1 MAIA - Autonomous intelligent machine
Inria Nancy - Grand Est, LORIA - AIS - Department of Complex Systems, Artificial Intelligence & Robotics
2 BIGS - Biology, genetics and statistics
Inria Nancy - Grand Est, IECL - Institut Élie Cartan de Lorraine
3 Probabilités et statistiques
IECL - Institut Élie Cartan de Lorraine
Abstract : We consider LSTD(λ), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a β-mixing assumption , we derive, for any value of λ ∈ (0, 1), a high-probability bound on the rate of convergence of this algorithm to its limit. We deduce a high-probability bound on the error of this algorithm, that extends (and slightly improves) that derived by Lazaric et al. (2012) in the specific case where λ = 0. In the context of temporal-difference algorithms with value function approximation, this analysis is to our knowledge the first to provide insight on the choice of the eligibility-trace parameter λ with respect to the approximation quality of the space and the number of samples.
Type de document :
Communication dans un congrès
ICML 2015, Jul 2015, Lille, France. 2015
Liste complète des métadonnées

https://hal.inria.fr/hal-01186667
Contributeur : Bruno Scherrer <>
Soumis le : mardi 25 août 2015 - 14:17:16
Dernière modification le : jeudi 11 janvier 2018 - 06:26:22
Document(s) archivé(s) le : jeudi 26 novembre 2015 - 14:01:34

Fichiers

lstd.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01186667, version 1

Citation

Manel Tagorti, Bruno Scherrer. On the Rate of Convergence and Error Bounds for LSTD(λ). ICML 2015, Jul 2015, Lille, France. 2015. 〈hal-01186667〉

Partager

Métriques

Consultations de la notice

238

Téléchargements de fichiers

71