Least-Squares λ Policy Iteration: Bias-Variance Trade-off in Control Problems

Christophe Thiery 1 Bruno Scherrer 1
1 MAIA - Autonomous intelligent machine
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In the context of large space MDPs with linear value function approximation, we introduce a new approximate version of λ-Policy Iteration (Bertsekas & Ioffe, 1996), a method that generalizes Value Iteration and Policy Iteration with a parameter λ ∈ (0, 1). Our approach, called Least-Squares λ Policy Iteration, generalizes LSPI (Lagoudakis & Parr, 2003) which makes efficient use of training samples compared to classical temporaldifferences methods. The motivation of our work is to exploit the λ parameter within the least-squares context, and without having to generate new samples at each iteration or to know a model of the MDP. We provide a performance bound that shows the soundness of the algorithm. We show empirically on a simple chain problem and on the Tetris game that this λ parameter acts as a bias-variance trade-off that may improve the convergence and the performance of the policy obtained.
Document type :
Conference papers
Complete list of metadatas

Cited literature [14 references]  Display  Hide  Download

https://hal.inria.fr/inria-00520841
Contributor : Christophe Thiery <>
Submitted on : Friday, September 24, 2010 - 1:27:30 PM
Last modification on : Thursday, January 11, 2018 - 6:19:50 AM
Long-term archiving on : Saturday, December 25, 2010 - 2:54:17 AM

File

article.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00520841, version 1

Collections

Citation

Christophe Thiery, Bruno Scherrer. Least-Squares λ Policy Iteration: Bias-Variance Trade-off in Control Problems. International Conference on Machine Learning, Jun 2010, Haifa, Israel. ⟨inria-00520841⟩

Share

Metrics

Record views

465

Files downloads

731