Conservative and Greedy Approaches to Classification-based Policy Iteration

Mohammad Ghavamzadeh; Alessandro Lazaric

Communication Dans Un Congrès Année : 2012

Conservative and Greedy Approaches to Classification-based Policy Iteration

(1) , (1)

Mohammad Ghavamzadeh

Fonction : Auteur
PersonId : 868946

Sequential Learning

Alessandro Lazaric

Fonction : Auteur
PersonId : 851
IdHAL : alessandro-lazaric
ORCID : 0000-0002-8970-413X
IdRef : 188701486

Sequential Learning

Résumé

The existing classification-based policy iteration (CBPI) algorithms can be divided into two categories: {\em direct policy iteration} (DPI) methods that directly assign the output of the classifier (the approximate greedy policy w.r.t.~the current policy) to the next policy, and {\em conservative policy iteration} (CPI) methods in which the new policy is a mixture distribution of the current policy and the output of the classifier. The conservative policy update gives CPI a desirable feature, namely the guarantee that the policies generated by this algorithm improve at each iteration. We provide a detailed algorithmic and theoretical comparison of these two classes of CBPI algorithms. Our results reveal that in order to achieve the same level of accuracy, CPI requires more iterations, and thus, more samples than the DPI algorithm. Furthermore, CPI may converge to suboptimal policies whose performance is not better than DPI's.

Domaines

Machine Learning [stat.ML]

Fichier principal

Ghavamzadeh.pdf (133.7 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Alessandro Lazaric : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00772610

Soumis le : jeudi 10 janvier 2013-18:12:36

Dernière modification le : jeudi 15 février 2024-03:32:31

Archivage à long terme le : jeudi 11 avril 2013-04:08:46

Dates et versions

hal-00772610 , version 1 (10-01-2013)

Identifiants

HAL Id : hal-00772610 , version 1

Citer

Mohammad Ghavamzadeh, Alessandro Lazaric. Conservative and Greedy Approaches to Classification-based Policy Iteration. AAAI - 26th Conference on Artificial Intelligence, Jul 2012, Toronto, Canada. ⟨hal-00772610⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UNIV-LILLE3 CNRS INRIA IRISA LAGIS INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

216 Consultations

110 Téléchargements

Conservative and Greedy Approaches to Classification-based Policy Iteration

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager