Approximate Modified Policy Iteration - Archive ouverte HAL Access content directly
Reports (Research Report) Year : 2012

Approximate Modified Policy Iteration

(1) , (2) , (2) , (3, 4)
1
2
3
4
Bruno Scherrer
Victor Gabillon
  • Function : Author
  • PersonId : 925091
Mohammad Ghavamzadeh
  • Function : Author
  • PersonId : 868946

Abstract

Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are extensions of well-known approximate DP algorithms: fitted-value iteration, fitted-Q iteration, and classification-based policy iteration. We provide error propagation analyses that unify those for approximate policy and value iteration. On the last classification-based implementation, we develop a finite-sample analysis that shows that MPI's main parameter allows to control the balance between the estimation error of the classifier and the overall value function approximation.
Fichier principal
Vignette du fichier
article.pdf (493.76 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-00697169 , version 1 (14-05-2012)
hal-00697169 , version 2 (16-05-2012)

Identifiers

Cite

Bruno Scherrer, Victor Gabillon, Mohammad Ghavamzadeh, Matthieu Geist. Approximate Modified Policy Iteration. [Research Report] 2012. ⟨hal-00697169v2⟩
373 View
249 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More