Analysis of Classification-based Policy Iteration Algorithms

Alessandro Lazaric; Mohammad Ghavamzadeh; Rémi Munos

Article Dans Une Revue Journal of Machine Learning Research Année : 2016

Analysis of Classification-based Policy Iteration Algorithms

(1) , (2, 1) , (3, 1)

1
2
3

Alessandro Lazaric

Fonction : Auteur
PersonId : 851
IdHAL : alessandro-lazaric
ORCID : 0000-0002-8970-413X
IdRef : 188701486

Sequential Learning

Mohammad Ghavamzadeh

Fonction : Auteur
PersonId : 868946

Adobe Systems Inc.

Sequential Learning

Rémi Munos

Fonction : Auteur
PersonId : 836863

DeepMind [London]

Sequential Learning

Résumé

We introduce a variant of the classification-based approach to policy iteration which uses a cost-sensitive loss function weighting each classification mistake by its actual regret, that is, the difference between the action-value of the greedy action and of the action chosen by the classifier. For this algorithm, we provide a full finite-sample analysis. Our results state a performance bound in terms of the number of policy improvement steps, the number of rollouts used in each iteration, the capacity of the considered policy space (classifier), and a capacity measure which indicates how well the policy space can approximate policies that are greedy with respect to any of its members. The analysis reveals a tradeoff between the estimation and approximation errors in this classification-based policy iteration setting. Furthermore it confirms the intuition that classification-based policy iteration algorithms could be favorably compared to value-based approaches when the policies can be approximated more easily than their corresponding value functions. We also study the consistency of the algorithm when there exists a sequence of policy spaces with increasing capacity.

Mots clés

reinforcement learning policy iteration classification-based approach to policy iteration finite-sample analysis

Domaines

Machine Learning [stat.ML]

Fichier principal

10-364.pdf (558.38 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Alessandro Lazaric : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01401513

Soumis le : mercredi 23 novembre 2016-14:50:23

Dernière modification le : mercredi 24 janvier 2024-09:54:23

Archivage à long terme le : mardi 21 mars 2017-09:45:35

Dates et versions

hal-01401513 , version 1 (23-11-2016)

Identifiants

HAL Id : hal-01401513 , version 1

Citer

Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos. Analysis of Classification-based Policy Iteration Algorithms. Journal of Machine Learning Research, 2016, 17, pp.1 - 30. ⟨hal-01401513⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA CRISTAL INRIA2 CRISTAL-SEQUEL UNIV-LILLE

142 Consultations

80 Téléchargements

Analysis of Classification-based Policy Iteration Algorithms

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager