Performance bounds for Lambda Policy Iteration

Bruno Scherrer

Rapport (Rapport De Recherche) Année : 2007

Performance bounds for Lambda Policy Iteration

(1)

Bruno Scherrer

Fonction : Auteur
PersonId : 1406
IdHAL : bruno-scherrer
IdRef : 073360708

Autonomous intelligent machine

Résumé

We consider the discrete-time infinite-horizon discounted stationary optimal control problem formalized by Markov Decision Processes. We study Lambda Policy Iteration, a family of algorithms parameterized by lambda, originally introduced by Ioffe and Bertsekas. Lambda Policy Iteration generalizes the standard algorithms Value Iteration and Policy Iteration, and has some connections with TD(Lambda) introduced by Sutton & Barto. We deepen the original theory developped by Ioffe and Bertsekas by providing convergence rate bounds which generalize standard bounds for Value Iteration described for instance by Puterman. We also develop the theory of this algorithm when it is used in an approximate form. Doing so, we extend and unify the separate analyses developped by Munos for Approximate Value Iteration and Approximate Policy Iteration.

Mots clés

Markov Decision Processes Anlaysis of Algorithms Bounds

Domaines

Recherche opérationnelle [math.OC] Intelligence artificielle [cs.AI]

Fichier principal

lpi_v0.pdf (313.3 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Bruno Scherrer : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00185271

Soumis le : lundi 5 novembre 2007-16:31:00

Dernière modification le : vendredi 24 mars 2023-14:52:49

Archivage à long terme le : lundi 12 avril 2010-01:22:30

Dates et versions

inria-00185271 , version 1 (05-11-2007)

inria-00185271 , version 2 (09-11-2007)

inria-00185271 , version 3 (13-11-2007)

inria-00185271 , version 4 (03-10-2011)

inria-00185271 , version 5 (11-10-2011)

Identifiants

HAL Id : inria-00185271 , version 1
ARXIV : 0711.0694

Citer

Bruno Scherrer. Performance bounds for Lambda Policy Iteration. [Research Report] 2007, pp.29. ⟨inria-00185271v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

326 Consultations

326 Téléchargements

Performance bounds for Lambda Policy Iteration

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Altmetric

Partager