Performance bounds for Lambda Policy Iteration - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Rapport (Rapport De Recherche) Année : 2007

Performance bounds for Lambda Policy Iteration

Bruno Scherrer

Résumé

We consider the discrete-time infinite-horizon discounted stationary optimal control problem formalized by Markov Decision Processes. We study Lambda Policy Iteration, a family of algorithms parameterized by lambda, originally introduced by Ioffe and Bertsekas. Lambda Policy Iteration generalizes the standard algorithms Value Iteration and Policy Iteration, and has some connections with TD(Lambda) introduced by Sutton & Barto. We deepen the original theory developped by Ioffe and Bertsekas by providing convergence rate bounds which generalize standard bounds for Value Iteration described for instance by Puterman. We also develop the theory of this algorithm when it is used in an approximate form. Doing so, we extend and unify the separate analyses developped by Munos for Approximate Value Iteration and Approximate Policy Iteration.
Fichier principal
Vignette du fichier
lpi_v0.pdf (313.3 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

inria-00185271 , version 1 (05-11-2007)
inria-00185271 , version 2 (09-11-2007)
inria-00185271 , version 3 (13-11-2007)
inria-00185271 , version 4 (03-10-2011)
inria-00185271 , version 5 (11-10-2011)

Identifiants

  • HAL Id : inria-00185271 , version 1
  • ARXIV : 0711.0694

Citer

Bruno Scherrer. Performance bounds for Lambda Policy Iteration. [Research Report] 2007, pp.29. ⟨inria-00185271v1⟩
326 Consultations
326 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More