3532 articles – 5253 Notices  [english version]

hal-00697169, version 2

Approximate Modified Policy Iteration

Bruno Scherrer (, http://www.loria.fr/~scherrer) 1, Victor Gabillon () a2, Mohammad Ghavamzadeh () 2, Matthieu Geist () 34

(2012)

Résumé : Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are extensions of well-known approximate DP algorithms: fitted-value iteration, fitted-Q iteration, and classification-based policy iteration. We provide error propagation analyses that unify those for approximate policy and value iteration. On the last classification-based implementation, we develop a finite-sample analysis that shows that MPI's main parameter allows to control the balance between the estimation error of the classifier and the overall value function approximation.

  • a –  INRIA
  • 1 :  MAIA (INRIA Nancy - Grand Est / LORIA)
  • INRIA – CNRS : UMR7503 – Université de Lorraine
  • 2 :  SEQUEL (INRIA Lille - Nord Europe)
  • INRIA – CNRS : UMR8146 – Université Lille I - Sciences et technologies – Université Lille III - Sciences humaines et sociales – Ecole Centrale de Lille
  • 3 :  SUPELEC-Campus Metz
  • SUPELEC
  • 4 :  Georgia Tech - CNRS (UMI2958)
  • CNRS : UMI2958 – Georgia Institute of Technology Atlanta – Georgia Tech Lorraine – SUPELEC – Université de Franche-Comté – Université Paul Verlaine - Metz – Ecole Nationale Supérieure des Arts et Metiers Metz
 
  • hal-00697169, version 2
  • oai:hal.inria.fr:hal-00697169
  • Contributeur : 
  • Soumis le : Mercredi 16 Mai 2012, 17:02:59
  • Dernière modification le : Vendredi 26 Octobre 2012, 14:56:50