Approximate Modified Policy Iteration

Bruno Scherrer; Victor Gabillon; Mohammad Ghavamzadeh; Matthieu Geist

Rapport (Rapport De Recherche) Année : 2012

Approximate Modified Policy Iteration

(1) , (2) , (2) , (3, 4)

1
2
3
4

Bruno Scherrer

Fonction : Auteur
PersonId : 1406
IdHAL : bruno-scherrer
IdRef : 073360708

Autonomous intelligent machine

Victor Gabillon

Fonction : Auteur
PersonId : 925091

Sequential Learning

Mohammad Ghavamzadeh

Fonction : Auteur
PersonId : 868946

Sequential Learning

Matthieu Geist

Fonction : Auteur
PersonId : 6945
IdHAL : matthieu-geist

SUPELEC-Campus Metz

Georgia Tech Lorraine [Metz]

Résumé

Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three approximate MPI (AMPI) algorithms that are extensions of the well-known approximate DP algorithms: fitted-value iteration, fitted-Q iteration, and classification-based policy iteration. We provide an error propagation analysis for AMPI that unifies those for approximate policy and value iteration. We also provide a finite-sample analysis for the classification-based implementation of AMPI (CBMPI), which is more general (and somehow contains) than the analysis of the other presented AMPI algorithms. An interesting observation is that the MPI's parameter allows us to control the balance of errors (in value function approximation and in estimating the greedy policy) in the final performance of the CBMPI algorithm.

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

article.pdf (474.24 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Bruno Scherrer : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00697169

Soumis le : lundi 14 mai 2012-16:46:32

Dernière modification le : jeudi 13 avril 2023-09:26:12

Archivage à long terme le : jeudi 15 décembre 2016-07:18:35

Dates et versions

hal-00697169 , version 1 (14-05-2012)

hal-00697169 , version 2 (16-05-2012)

Identifiants

HAL Id : hal-00697169 , version 1
ARXIV : 1205.3054

Citer

Bruno Scherrer, Victor Gabillon, Mohammad Ghavamzadeh, Matthieu Geist. Approximate Modified Policy Iteration. [Research Report] 2012. ⟨hal-00697169v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

SUP_IMS

378 Consultations

261 Téléchargements

Approximate Modified Policy Iteration

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager