3532 articles – 5253 references  [version française]

hal-00644935, version 1

Classification-based Policy Iteration with a Critic

Victor Gabillon 1, Alessandro Lazaric () 1, Mohammad Ghavamzadeh () 1, Bruno Scherrer () 2

International Conference on Machine Learning (ICML) (2011) 1049-1056

  • 1:  SEQUEL (INRIA Lille - Nord Europe)
  • http://www.inria.fr/equipes/sequel
    INRIA – CNRS : UMR8146 – Université Lille I - Sciences et technologies – Université Lille III - Sciences humaines et sociales – Ecole Centrale de Lille France
  • 2:  MAIA (INRIA Lorraine - LORIA)

  • INRIA – CNRS : UMR7503 – Université Henri Poincaré - Nancy I – Université Nancy II – Institut National Polytechnique de Lorraine (INPL) France

Bibliographic reference

  • Type of document: Peer-reviewed conferences/proceedings
  • Domain: Statistics/Other Statistics
  • Title: Classification-based Policy Iteration with a Critic
  • Abstract: In this paper, we study the effect of adding a value function approximation component (critic) to rollout classification-based policy iteration (RCPI) algorithms. The idea is to use a critic to approximate the return after we truncate the rollout trajectories. This allows us to control the bias and variance of the rollout estimates of the action-value function. Therefore, the introduction of a critic can improve the accuracy of the rollout estimates, and as a result, enhance the performance of the RCPI algorithm. We present a new RCPI algorithm, called direct policy iteration with critic (DPI-Critic), and provide its finite-sample analysis when the critic is based on the LSTD method. We empirically evaluate the performance of DPI-Critic and compare it with DPI and LSPI in two benchmark reinforcement learning problems.
  • Full text language: English
  • Publication date: 2011-06-29
  • Audience: international
  • Conference title: International Conference on Machine Learning (ICML)
  • Conference city: Seattle
  • Country: United States
  • Conference date: 2011-06-28
  • Conference date (end): 2011-07-02
  • Commercial editor: ACM
  • Volume title : Proceedings of the 28 th International Conference on Machine Learning
  • Pagination: 1049-1056

Attached file list to this document: 

PDF
dpi-critic.pdf(215.1 KB)
 
  • hal-00644935, version 1
  • oai:hal.inria.fr:hal-00644935
  • From: 
  • Submitted on: Friday, 25 November 2011 15:24:52
  • Updated on: Saturday, 26 November 2011 09:52:42