Conservative and Greedy Approaches to Classification-based Policy Iteration

Mohammad Ghavamzadeh 1 Alessandro Lazaric 1
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
Abstract : The existing classification-based policy iteration (CBPI) algorithms can be divided into two categories: {\em direct policy iteration} (DPI) methods that directly assign the output of the classifier (the approximate greedy policy w.r.t.~the current policy) to the next policy, and {\em conservative policy iteration} (CPI) methods in which the new policy is a mixture distribution of the current policy and the output of the classifier. The conservative policy update gives CPI a desirable feature, namely the guarantee that the policies generated by this algorithm improve at each iteration. We provide a detailed algorithmic and theoretical comparison of these two classes of CBPI algorithms. Our results reveal that in order to achieve the same level of accuracy, CPI requires more iterations, and thus, more samples than the DPI algorithm. Furthermore, CPI may converge to suboptimal policies whose performance is not better than DPI's.
Type de document :
Communication dans un congrès
AAAI - 26th Conference on Artificial Intelligence, Jul 2012, Toronto, Canada. 2012
Liste complète des métadonnées

Littérature citée [9 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00772610
Contributeur : Alessandro Lazaric <>
Soumis le : jeudi 10 janvier 2013 - 18:12:36
Dernière modification le : jeudi 11 janvier 2018 - 01:49:33
Document(s) archivé(s) le : jeudi 11 avril 2013 - 04:08:46

Fichier

Ghavamzadeh.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00772610, version 1

Collections

Citation

Mohammad Ghavamzadeh, Alessandro Lazaric. Conservative and Greedy Approaches to Classification-based Policy Iteration. AAAI - 26th Conference on Artificial Intelligence, Jul 2012, Toronto, Canada. 2012. 〈hal-00772610〉

Partager

Métriques

Consultations de la notice

193

Téléchargements de fichiers

116