Approximate Modified Policy Iteration

Bruno Scherrer 1 Victor Gabillon 2 Mohammad Ghavamzadeh 2 Matthieu Geist 3, 4
1 MAIA - Autonomous intelligent machine
Inria Nancy - Grand Est, LORIA - AIS - Department of Complex Systems, Artificial Intelligence & Robotics
2 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
Abstract : Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are extensions of well-known approximate DP algorithms: fitted-value iteration, fitted-Q iteration, and classification-based policy iteration. We provide error propagation analyses that unify those for approximate policy and value iteration. On the last classification-based implementation, we develop a finite-sample analysis that shows that MPI's main parameter allows to control the balance between the estimation error of the classifier and the overall value function approximation.
Document type :
Reports
Complete list of metadatas

Cited literature [16 references]  Display  Hide  Download

https://hal.inria.fr/hal-00697169
Contributor : Bruno Scherrer <>
Submitted on : Wednesday, May 16, 2012 - 5:02:59 PM
Last modification on : Thursday, February 21, 2019 - 10:52:49 AM
Long-term archiving on : Friday, March 31, 2017 - 8:30:50 AM

Files

article.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00697169, version 2
  • ARXIV : 1205.3054

Citation

Bruno Scherrer, Victor Gabillon, Mohammad Ghavamzadeh, Matthieu Geist. Approximate Modified Policy Iteration. [Research Report] 2012. ⟨hal-00697169v2⟩

Share

Metrics

Record views

753

Files downloads

289