Classification-based Policy Iteration with a Critic

Victor Gabillon; Alessandro Lazaric; Mohammad Ghavamzadeh; Bruno Scherrer

Communication Dans Un Congrès Année : 2011

Classification-based Policy Iteration with a Critic

(1) , (1) , (1) , (2)

1
2

Victor Gabillon

Fonction : Auteur

Sequential Learning

Alessandro Lazaric

Fonction : Auteur
PersonId : 851
IdHAL : alessandro-lazaric
ORCID : 0000-0002-8970-413X
IdRef : 188701486

Sequential Learning

Mohammad Ghavamzadeh

Fonction : Auteur
PersonId : 868946

Sequential Learning

Bruno Scherrer

Fonction : Auteur
PersonId : 1406
IdHAL : bruno-scherrer
IdRef : 073360708

Autonomous intelligent machine

Résumé

In this paper, we study the effect of adding a value function approximation component (critic) to rollout classification-based policy iteration (RCPI) algorithms. The idea is to use a critic to approximate the return after we truncate the rollout trajectories. This allows us to control the bias and variance of the rollout estimates of the action-value function. Therefore, the introduction of a critic can improve the accuracy of the rollout estimates, and as a result, enhance the performance of the RCPI algorithm. We present a new RCPI algorithm, called direct policy iteration with critic (DPI-Critic), and provide its finite-sample analysis when the critic is based on the LSTD method. We empirically evaluate the performance of DPI-Critic and compare it with DPI and LSPI in two benchmark reinforcement learning problems.

Domaines

Autres [stat.ML]

Fichier principal

dpi-critic.pdf (221.22 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Victor Gabillon : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00644935

Soumis le : vendredi 25 novembre 2011-15:24:52

Dernière modification le : vendredi 24 mars 2023-14:52:55

Archivage à long terme le : dimanche 26 février 2012-02:32:01

Dates et versions

hal-00644935 , version 1 (25-11-2011)

Identifiants

HAL Id : hal-00644935 , version 1

Citer

Victor Gabillon, Alessandro Lazaric, Mohammad Ghavamzadeh, Bruno Scherrer. Classification-based Policy Iteration with a Critic. International Conference on Machine Learning (ICML), Jun 2011, Seattle, United States. pp.1049-1056. ⟨hal-00644935⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LILLE3 CNRS INRIA LAGIS GRID5000 UNIV-LORRAINE INRIA2 LORIA SILECS

153 Consultations

85 Téléchargements

Classification-based Policy Iteration with a Critic

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager