Classification-based Policy Iteration with a Critic

Victor Gabillon 1 Alessandro Lazaric 1 Mohammad Ghavamzadeh 1 Bruno Scherrer 2
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
2 MAIA - Autonomous intelligent machine
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In this paper, we study the effect of adding a value function approximation component (critic) to rollout classification-based policy iteration (RCPI) algorithms. The idea is to use a critic to approximate the return after we truncate the rollout trajectories. This allows us to control the bias and variance of the rollout estimates of the action-value function. Therefore, the introduction of a critic can improve the accuracy of the rollout estimates, and as a result, enhance the performance of the RCPI algorithm. We present a new RCPI algorithm, called direct policy iteration with critic (DPI-Critic), and provide its finite-sample analysis when the critic is based on the LSTD method. We empirically evaluate the performance of DPI-Critic and compare it with DPI and LSPI in two benchmark reinforcement learning problems.
Type de document :
Communication dans un congrès
International Conference on Machine Learning (ICML), Jun 2011, Seattle, United States. ACM, pp.1049-1056, 2011, Proceedings of the 28 th International Conference on Machine Learning
Liste complète des métadonnées

https://hal.inria.fr/hal-00644935
Contributeur : Victor Gabillon <>
Soumis le : vendredi 25 novembre 2011 - 15:24:52
Dernière modification le : jeudi 11 janvier 2018 - 06:22:13
Document(s) archivé(s) le : dimanche 26 février 2012 - 02:32:01

Fichier

dpi-critic.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00644935, version 1

Citation

Victor Gabillon, Alessandro Lazaric, Mohammad Ghavamzadeh, Bruno Scherrer. Classification-based Policy Iteration with a Critic. International Conference on Machine Learning (ICML), Jun 2011, Seattle, United States. ACM, pp.1049-1056, 2011, Proceedings of the 28 th International Conference on Machine Learning. 〈hal-00644935〉

Partager

Métriques

Consultations de la notice

347

Téléchargements de fichiers

106