Approximate Modified Policy Iteration - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2012

Approximate Modified Policy Iteration

Bruno Scherrer
Mohammad Ghavamzadeh
  • Fonction : Auteur
  • PersonId : 868946
Victor Gabillon
  • Fonction : Auteur
  • PersonId : 925091

Résumé

Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are extensions of well-known approximate DP algorithms: fitted-value iteration, fitted-Q iteration, and classification-based policy iteration. We provide error propagation analysis that unifies those for approximate policy and value iteration. For the classification-based implementation, we develop a finite-sample analysis that shows that MPI's main parameter allows to control the balance between the estimation error of the classifier and the overall value function approximation.
Fichier principal
Vignette du fichier
icml-short.pdf (423.97 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00758882 , version 1 (29-11-2012)

Identifiants

  • HAL Id : hal-00758882 , version 1

Citer

Bruno Scherrer, Mohammad Ghavamzadeh, Victor Gabillon, Matthieu Geist. Approximate Modified Policy Iteration. 29th International Conference on Machine Learning - ICML 2012, Jun 2012, Edinburgh, United Kingdom. ⟨hal-00758882⟩
233 Consultations
158 Téléchargements

Partager

Gmail Facebook X LinkedIn More