Skip to Main content Skip to Navigation
Reports

Approximate Modified Policy Iteration

Bruno Scherrer 1 Victor Gabillon 2 Mohammad Ghavamzadeh 2 Matthieu Geist 3, 4
1 MAIA - Autonomous intelligent machine
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
2 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three approximate MPI (AMPI) algorithms that are extensions of the well-known approximate DP algorithms: fitted-value iteration, fitted-Q iteration, and classification-based policy iteration. We provide an error propagation analysis for AMPI that unifies those for approximate policy and value iteration. We also provide a finite-sample analysis for the classification-based implementation of AMPI (CBMPI), which is more general (and somehow contains) than the analysis of the other presented AMPI algorithms. An interesting observation is that the MPI's parameter allows us to control the balance of errors (in value function approximation and in estimating the greedy policy) in the final performance of the CBMPI algorithm.
Document type :
Reports
Complete list of metadatas

https://hal.inria.fr/hal-00697169
Contributor : Bruno Scherrer <>
Submitted on : Monday, May 14, 2012 - 4:46:32 PM
Last modification on : Wednesday, July 31, 2019 - 4:18:02 PM
Document(s) archivé(s) le : Thursday, December 15, 2016 - 7:18:35 AM

Files

article.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00697169, version 1
  • ARXIV : 1205.3054

Collections

Citation

Bruno Scherrer, Victor Gabillon, Mohammad Ghavamzadeh, Matthieu Geist. Approximate Modified Policy Iteration. [Research Report] 2012. ⟨hal-00697169v1⟩

Share

Metrics

Record views

59

Files downloads

51