Approximate Modified Policy Iteration

Bruno Scherrer; Mohammad Ghavamzadeh; Victor Gabillon; Matthieu Geist

Communication Dans Un Congrès Année : 2012

Approximate Modified Policy Iteration

(1) , (2) , (2) , (3, 4)

1
2
3
4

Bruno Scherrer

Fonction : Auteur
PersonId : 1406
IdHAL : bruno-scherrer
IdRef : 073360708

Autonomous intelligent machine

Mohammad Ghavamzadeh

Fonction : Auteur
PersonId : 868946

Sequential Learning

Victor Gabillon

Fonction : Auteur
PersonId : 925091

Sequential Learning

Matthieu Geist

Fonction : Auteur
PersonId : 6945
IdHAL : matthieu-geist

SUPELEC-Campus Metz

Georgia Tech Lorraine [Metz]

Résumé

Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are extensions of well-known approximate DP algorithms: fitted-value iteration, fitted-Q iteration, and classification-based policy iteration. We provide error propagation analysis that unifies those for approximate policy and value iteration. For the classification-based implementation, we develop a finite-sample analysis that shows that MPI's main parameter allows to control the balance between the estimation error of the classifier and the overall value function approximation.

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

icml-short.pdf (423.97 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Bruno Scherrer : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00758882

Soumis le : jeudi 29 novembre 2012-14:59:06

Dernière modification le : jeudi 1 février 2024-10:06:02

Archivage à long terme le : samedi 17 décembre 2016-17:32:16

Dates et versions

hal-00758882 , version 1 (29-11-2012)

Identifiants

HAL Id : hal-00758882 , version 1

Citer

Bruno Scherrer, Mohammad Ghavamzadeh, Victor Gabillon, Matthieu Geist. Approximate Modified Policy Iteration. 29th International Conference on Machine Learning - ICML 2012, Jun 2012, Edinburgh, United Kingdom. ⟨hal-00758882⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

SUPELEC UNIV-RENNES1 UNIV-LILLE3 CNRS INRIA UNIV-FCOMTE IRISA SUP_IMS LAGIS UMI-GTL UNIV-LORRAINE INRIA2 LORIA LORIA-AIS UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

233 Consultations

158 Téléchargements

Approximate Modified Policy Iteration

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager