hal-00697169, version 1
Approximate Modified Policy Iteration
(2012)
Abstract: Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three approximate MPI (AMPI) algorithms that are extensions of the well-known approximate DP algorithms: fitted-value iteration, fitted-Q iteration, and classification-based policy iteration. We provide an error propagation analysis for AMPI that unifies those for approximate policy and value iteration. We also provide a finite-sample analysis for the classification-based implementation of AMPI (CBMPI), which is more general (and somehow contains) than the analysis of the other presented AMPI algorithms. An interesting observation is that the MPI's parameter allows us to control the balance of errors (in value function approximation and in estimating the greedy policy) in the final performance of the CBMPI algorithm.
- a – INRIA
- 1:
- INRIA – CNRS : UMR7503 – Université Henri Poincaré - Nancy I – Université Nancy II – Institut National Polytechnique de Lorraine (INPL)
- 2:
- INRIA – CNRS : UMR8146 – Université Lille I - Sciences et technologies – Université Lille III - Sciences humaines et sociales – Ecole Centrale de Lille
- 3:
- SUPELEC
- 4:
- CNRS : UMI2958 – Georgia Institute of Technology Atlanta – Georgia Tech Lorraine – SUPELEC – Université de Franche-Comté – Université Paul Verlaine - Metz – Ecole Nationale Supérieure des Arts et Metiers Metz
- Domain : Computer Science/Artificial Intelligence
- Available versions : v1 (2012-05-14) v2 (2012-05-18)
- hal-00697169, version 1
- http://hal.inria.fr/hal-00697169
- oai:hal.inria.fr:hal-00697169
- From:
- Submitted on: Monday, 14 May 2012 16:46:32
- Updated on: Monday, 14 May 2012 17:01:34



Associated documents
Export