Skip to Main content Skip to Navigation
Conference papers

On the rate of convergence and error bounds for LSTD(λ)

Manel Tagorti 1, * Bruno Scherrer 2, 3 
* Corresponding author
1 MAIA - Autonomous intelligent machine
Inria Nancy - Grand Est, LORIA - AIS - Department of Complex Systems, Artificial Intelligence & Robotics
2 BIGS - Biology, genetics and statistics
Inria Nancy - Grand Est, IECL - Institut Élie Cartan de Lorraine
Abstract : We consider LSTD(λ), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a β-mixing assumption , we derive, for any value of λ ∈ (0, 1), a high-probability bound on the rate of convergence of this algorithm to its limit. We deduce a high-probability bound on the error of this algorithm, that extends (and slightly improves) that derived by Lazaric et al. (2012) in the specific case where λ = 0. In the context of temporal-difference algorithms with value function approximation, this analysis is to our knowledge the first to provide insight on the choice of the eligibility-trace parameter λ with respect to the approximation quality of the space and the number of samples.
Complete list of metadata
Contributor : Bruno Scherrer Connect in order to contact the contributor
Submitted on : Tuesday, August 25, 2015 - 2:17:16 PM
Last modification on : Thursday, January 20, 2022 - 5:27:03 PM
Long-term archiving on: : Thursday, November 26, 2015 - 2:01:34 PM


Files produced by the author(s)


  • HAL Id : hal-01186667, version 1


Manel Tagorti, Bruno Scherrer. On the rate of convergence and error bounds for LSTD(λ). ICML 2015, Jul 2015, Lille, France. ⟨hal-01186667⟩



Record views


Files downloads