inria-00482065, version 3
Analysis of a Classification-based Policy Iteration Algorithm
Alessandro Lazaric
a, 1Mohammad Ghavamzadeh
1Remi Munos
a, 1
(2010)
Résumé : We present a classification-based policy iteration algorithm, called Direct Policy Iteration, and provide its finite-sample analysis. Our results state a performance bound in terms of the number of policy improvement steps, the number of rollouts used in each iteration, the capacity of the considered policy space, and a new capacity measure which indicates how well the policy space can approximate policies that are greedy w.r.t. any of its members. The analysis reveals a tradeoff between the estimation and approximation errors in this classification-based policy iteration setting. We also study the consistency of the method when there exists a sequence of policy spaces with increasing capacity.
- a – INRIA
- 1 : SEQUEL (INRIA Lille - Nord Europe)
- INRIA – CNRS : UMR8022 – CNRS : UMR8146 – Université Lille 1 - Sciences et Technologies – Université Charles de Gaulle - Lille III – Ecole Centrale de Lille
- Domaine : Sciences cognitives/Informatique
- Versions disponibles : v1 (09-05-2010) v2 (26-01-2011) v3 (30-01-2012)
- inria-00482065, version 3
- http://hal.inria.fr/inria-00482065
- oai:hal.inria.fr:inria-00482065
- Contributeur : Mohammad Ghavamzadeh
- Soumis le : Lundi 30 Janvier 2012, 13:56:45
- Dernière modification le : Lundi 30 Janvier 2012, 14:26:47






Documents associés
Exporter