On the rate of convergence and error bounds for LSTD(λ)

Manel Tagorti; Bruno Scherrer

Communication Dans Un Congrès Année : 2015

On the rate of convergence and error bounds for LSTD(λ)

(1) , (2, 3)

1
2
3

Manel Tagorti

Fonction : Auteur correspondant
PersonId : 969050

Connectez-vous pour contacter l'auteur

Autonomous intelligent machine

Bruno Scherrer

Fonction : Auteur
PersonId : 1406
IdHAL : bruno-scherrer
IdRef : 073360708

Biology, genetics and statistics

Institut Élie Cartan de Lorraine

Résumé

We consider LSTD(λ), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a β-mixing assumption , we derive, for any value of λ ∈ (0, 1), a high-probability bound on the rate of convergence of this algorithm to its limit. We deduce a high-probability bound on the error of this algorithm, that extends (and slightly improves) that derived by Lazaric et al. (2012) in the specific case where λ = 0. In the context of temporal-difference algorithms with value function approximation, this analysis is to our knowledge the first to provide insight on the choice of the eligibility-trace parameter λ with respect to the approximation quality of the space and the number of samples.

Domaines

Optimisation et contrôle [math.OC] Apprentissage [cs.LG] Statistiques [math.ST]

Fichier principal

lstd.pdf (283.5 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Bruno Scherrer : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01186667

Soumis le : mardi 25 août 2015-14:17:16

Dernière modification le : vendredi 3 mai 2024-14:36:58

Archivage à long terme le : jeudi 26 novembre 2015-14:01:34

Dates et versions

hal-01186667 , version 1 (25-08-2015)

Identifiants

HAL Id : hal-01186667 , version 1

Citer

Manel Tagorti, Bruno Scherrer. On the rate of convergence and error bounds for LSTD(λ). ICML 2015, Jul 2015, Lille, France. ⟨hal-01186667⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA IRISA IECN INSMI UNIV-LORRAINE INRIA2 TDS-MACS LORIA LORIA-AIS UR1-MATH-STIC UR1-UFR-ISTIC IECLEDP UNIV-RENNES UR1-MATH-NUM

167 Consultations

106 Téléchargements

On the rate of convergence and error bounds for LSTD(λ)

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager