hal-00644935, version 1
Classification-based Policy Iteration with a Critic
Victor Gabillon 1Alessandro Lazaric
1Mohammad Ghavamzadeh
1Bruno Scherrer
2
International Conference on Machine Learning (ICML) (2011) 1049-1056
Résumé : In this paper, we study the effect of adding a value function approximation component (critic) to rollout classification-based policy iteration (RCPI) algorithms. The idea is to use a critic to approximate the return after we truncate the rollout trajectories. This allows us to control the bias and variance of the rollout estimates of the action-value function. Therefore, the introduction of a critic can improve the accuracy of the rollout estimates, and as a result, enhance the performance of the RCPI algorithm. We present a new RCPI algorithm, called direct policy iteration with critic (DPI-Critic), and provide its finite-sample analysis when the critic is based on the LSTD method. We empirically evaluate the performance of DPI-Critic and compare it with DPI and LSPI in two benchmark reinforcement learning problems.
- 1 : SEQUEL (INRIA Lille - Nord Europe)
- INRIA – CNRS : UMR8146 – Université Lille I - Sciences et technologies – Université Lille III - Sciences humaines et sociales – Ecole Centrale de Lille
- 2 : MAIA (INRIA Lorraine - LORIA)
- INRIA – CNRS : UMR7503 – Université Henri Poincaré - Nancy I – Université Nancy II – Institut National Polytechnique de Lorraine (INPL)
- Domaine : Statistiques/Autres
- hal-00644935, version 1
- http://hal.inria.fr/hal-00644935
- oai:hal.inria.fr:hal-00644935
- Contributeur : Victor Gabillon
- Soumis le : Vendredi 25 Novembre 2011, 15:24:52
- Dernière modification le : Samedi 26 Novembre 2011, 09:52:42






Documents associés
Exporter