Approximate Modified Policy Iteration

Bruno Scherrer 1 Mohammad Ghavamzadeh 2 Victor Gabillon 2 Matthieu Geist 3, 4
1 MAIA - Autonomous intelligent machine
Inria Nancy - Grand Est, LORIA - AIS - Department of Complex Systems, Artificial Intelligence & Robotics
2 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
Abstract : Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are extensions of well-known approximate DP algorithms: fitted-value iteration, fitted-Q iteration, and classification-based policy iteration. We provide error propagation analysis that unifies those for approximate policy and value iteration. For the classification-based implementation, we develop a finite-sample analysis that shows that MPI's main parameter allows to control the balance between the estimation error of the classifier and the overall value function approximation.
Type de document :
Communication dans un congrès
29th International Conference on Machine Learning - ICML 2012, Jun 2012, Edinburgh, United Kingdom. 2012
Liste complète des métadonnées

Littérature citée [16 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00758882
Contributeur : Bruno Scherrer <>
Soumis le : jeudi 29 novembre 2012 - 14:59:06
Dernière modification le : jeudi 5 avril 2018 - 12:30:11
Document(s) archivé(s) le : samedi 17 décembre 2016 - 17:32:16

Fichier

icml-short.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00758882, version 1

Citation

Bruno Scherrer, Mohammad Ghavamzadeh, Victor Gabillon, Matthieu Geist. Approximate Modified Policy Iteration. 29th International Conference on Machine Learning - ICML 2012, Jun 2012, Edinburgh, United Kingdom. 2012. 〈hal-00758882〉

Partager

Métriques

Consultations de la notice

607

Téléchargements de fichiers

199