Classification-based Policy Iteration with a Critic - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2011

Classification-based Policy Iteration with a Critic

Alessandro Lazaric
Mohammad Ghavamzadeh
  • Fonction : Auteur
  • PersonId : 868946
Bruno Scherrer

Résumé

In this paper, we study the effect of adding a value function approximation component (critic) to rollout classification-based policy iteration (RCPI) algorithms. The idea is to use a critic to approximate the return after we truncate the rollout trajectories. This allows us to control the bias and variance of the rollout estimates of the action-value function. Therefore, the introduction of a critic can improve the accuracy of the rollout estimates, and as a result, enhance the performance of the RCPI algorithm. We present a new RCPI algorithm, called direct policy iteration with critic (DPI-Critic), and provide its finite-sample analysis when the critic is based on the LSTD method. We empirically evaluate the performance of DPI-Critic and compare it with DPI and LSPI in two benchmark reinforcement learning problems.

Domaines

Autres [stat.ML]
Fichier principal
Vignette du fichier
dpi-critic.pdf (221.22 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-00644935 , version 1 (25-11-2011)

Identifiants

  • HAL Id : hal-00644935 , version 1

Citer

Victor Gabillon, Alessandro Lazaric, Mohammad Ghavamzadeh, Bruno Scherrer. Classification-based Policy Iteration with a Critic. International Conference on Machine Learning (ICML), Jun 2011, Seattle, United States. pp.1049-1056. ⟨hal-00644935⟩
153 Consultations
85 Téléchargements

Partager

Gmail Facebook X LinkedIn More