Analysis of a Classification-based Policy Iteration Algorithm

Alessandro Lazaric; Mohammad Ghavamzadeh; Remi Munos

Rapport (Rapport Technique) Année : 2010

Analysis of a Classification-based Policy Iteration Algorithm

(1) , (1) , (1)

Alessandro Lazaric

Fonction : Auteur
PersonId : 851
IdHAL : alessandro-lazaric
ORCID : 0000-0002-8970-413X
IdRef : 188701486

Sequential Learning

Mohammad Ghavamzadeh

Fonction : Auteur
PersonId : 868946

Sequential Learning

Remi Munos

Fonction : Auteur
PersonId : 836863

Sequential Learning

Résumé

We present a classification-based policy iteration algorithm, called Direct Policy Iteration, and provide its finite-sample analysis. Our results state a performance bound in terms of the number of policy improvement steps, the number of rollouts used in each iteration, the capacity of the considered policy space, and a new capacity measure which indicates how well the policy space can approximate policies that are greedy w.r.t. any of its members. The analysis reveals a tradeoff between the estimation and approximation errors in this classification-based policy iteration setting. We also study the consistency of the method when there exists a sequence of policy spaces with increasing capacity.

Domaines

Informatique

Fichier principal

dpi-tech.pdf (280.32 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Mohammad Ghavamzadeh : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00482065

Soumis le : vendredi 7 mai 2010-22:48:45

Dernière modification le : vendredi 24 mars 2023-14:52:53

Archivage à long terme le : jeudi 16 septembre 2010-14:05:03

Dates et versions

inria-00482065 , version 1 (07-05-2010)

inria-00482065 , version 2 (25-01-2011)

inria-00482065 , version 3 (30-01-2012)

Identifiants

HAL Id : inria-00482065 , version 1

Citer

Alessandro Lazaric, Mohammad Ghavamzadeh, Remi Munos. Analysis of a Classification-based Policy Iteration Algorithm. [Technical Report] 2010. ⟨inria-00482065v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

465 Consultations

627 Téléchargements

Analysis of a Classification-based Policy Iteration Algorithm

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Partager