On the rate of convergence and error bounds for LSTD(λ)

Manel Tagorti 1, * Bruno Scherrer 2, 3
* Corresponding author
1 MAIA - Autonomous intelligent machine
Inria Nancy - Grand Est, LORIA - AIS - Department of Complex Systems, Artificial Intelligence & Robotics
2 BIGS - Biology, genetics and statistics
Inria Nancy - Grand Est, IECL - Institut Élie Cartan de Lorraine
Abstract : We consider LSTD(λ), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a β-mixing assumption , we derive, for any value of λ ∈ (0, 1), a high-probability bound on the rate of convergence of this algorithm to its limit. We deduce a high-probability bound on the error of this algorithm, that extends (and slightly improves) that derived by Lazaric et al. (2012) in the specific case where λ = 0. In the context of temporal-difference algorithms with value function approximation, this analysis is to our knowledge the first to provide insight on the choice of the eligibility-trace parameter λ with respect to the approximation quality of the space and the number of samples.
Complete list of metadatas

https://hal.inria.fr/hal-01186667
Contributor : Bruno Scherrer <>
Submitted on : Tuesday, August 25, 2015 - 2:17:16 PM
Last modification on : Tuesday, December 18, 2018 - 4:40:21 PM
Long-term archiving on : Thursday, November 26, 2015 - 2:01:34 PM

Files

lstd.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01186667, version 1

Citation

Manel Tagorti, Bruno Scherrer. On the rate of convergence and error bounds for LSTD(λ). ICML 2015, Jul 2015, Lille, France. ⟨hal-01186667⟩

Share

Metrics

Record views

309

Files downloads

146