Skip to Main content Skip to Navigation
Conference papers

Analysis of a Classification-based Policy Iteration Algorithm

Alessandro Lazaric 1 Mohammad Ghavamzadeh 1 Remi Munos 1
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
Abstract : We present a classification-based policy iteration algorithm, called Direct Policy Iteration, and provide its finite-sample analysis. Our results state a performance bound in terms of the number of policy improvement steps, the number of rollouts used in each iteration, the capacity of the considered policy space, and a new capacity measure which indicates how well the policy space can approximate policies that are greedy w.r.t. any of its members. The analysis reveals a tradeoff between the estimation and approximation errors in this classification-based policy iteration setting. We also study the consistency of the method when there exists a sequence of policy spaces with increasing capacity.
Document type :
Conference papers
Complete list of metadatas

Cited literature [25 references]  Display  Hide  Download
Contributor : Mohammad Ghavamzadeh <>
Submitted on : Monday, January 30, 2012 - 1:56:45 PM
Last modification on : Tuesday, November 24, 2020 - 2:18:20 PM
Long-term archiving on: : Wednesday, December 14, 2016 - 2:29:10 AM


Files produced by the author(s)


  • HAL Id : inria-00482065, version 3



Alessandro Lazaric, Mohammad Ghavamzadeh, Remi Munos. Analysis of a Classification-based Policy Iteration Algorithm. ICML - 27th International Conference on Machine Learning, Jun 2010, Haifa, Israel. pp.607-614. ⟨inria-00482065v3⟩



Record views


Files downloads