Analysis of Classification-based Policy Iteration Algorithms

Alessandro Lazaric 1 Mohammad Ghavamzadeh 2, 1 Rémi Munos 3, 1
1 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Abstract : We introduce a variant of the classification-based approach to policy iteration which uses a cost-sensitive loss function weighting each classification mistake by its actual regret, that is, the difference between the action-value of the greedy action and of the action chosen by the classifier. For this algorithm, we provide a full finite-sample analysis. Our results state a performance bound in terms of the number of policy improvement steps, the number of rollouts used in each iteration, the capacity of the considered policy space (classifier), and a capacity measure which indicates how well the policy space can approximate policies that are greedy with respect to any of its members. The analysis reveals a tradeoff between the estimation and approximation errors in this classification-based policy iteration setting. Furthermore it confirms the intuition that classification-based policy iteration algorithms could be favorably compared to value-based approaches when the policies can be approximated more easily than their corresponding value functions. We also study the consistency of the algorithm when there exists a sequence of policy spaces with increasing capacity.
Type de document :
Article dans une revue
Journal of Machine Learning Research, Journal of Machine Learning Research, 2016, 17, pp.1 - 30
Liste complète des métadonnées

Littérature citée [36 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01401513
Contributeur : Alessandro Lazaric <>
Soumis le : mercredi 23 novembre 2016 - 14:50:23
Dernière modification le : mardi 3 juillet 2018 - 11:26:06
Document(s) archivé(s) le : mardi 21 mars 2017 - 09:45:35

Fichier

10-364.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01401513, version 1

Collections

Citation

Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos. Analysis of Classification-based Policy Iteration Algorithms. Journal of Machine Learning Research, Journal of Machine Learning Research, 2016, 17, pp.1 - 30. 〈hal-01401513〉

Partager

Métriques

Consultations de la notice

248

Téléchargements de fichiers

51