Approximate modified policy iteration and its application to the game of Tetris

Bruno Scherrer; Mohammad Ghavamzadeh; Victor Gabillon; Boris Lesner; Matthieu Geist

Article Dans Une Revue Journal of Machine Learning Research Année : 2015

Approximate modified policy iteration and its application to the game of Tetris

(1, 2, 3) , (4) , (4) , (1) , (5)

1
2
3
4
5

Bruno Scherrer

Fonction : Auteur
PersonId : 1406
IdHAL : bruno-scherrer
IdRef : 073360708

Autonomous intelligent machine

Biology, genetics and statistics

Institut Élie Cartan de Lorraine

Mohammad Ghavamzadeh

Fonction : Auteur
PersonId : 868946

Sequential Learning

Victor Gabillon

Fonction : Auteur
PersonId : 925091

Sequential Learning

Boris Lesner

Fonction : Auteur
PersonId : 933391

Autonomous intelligent machine

Matthieu Geist

Fonction : Auteur
PersonId : 6945
IdHAL : matthieu-geist

IMS : Information, Multimodalité & Signal

Résumé

Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are extensions of the well-known approximate DP algorithms:~fitted-value iteration, fitted-Q iteration, and classification-based policy iteration. We provide error propagation analysis that unify those for approximate policy and value iteration. We develop the finite-sample analysis of these algorithms, which highlights the influence of their parameters. In the classification-based version of the algorithm (CBMPI), the analysis shows that MPI's main parameter controls the balance between the estimation error of the classifier and the overall value function approximation. We illustrate and evaluate the behavior of these new algorithms in the Mountain Car and Tetris problems. Remarkably, in Tetris, CBMPI outperforms the existing DP approaches by a large margin, and competes with the current state-of-the-art methods while using fewer samples.

Mots clés

performance bounds approximate dynamic programming finite-sample analysis game of tetris reinforcement learning Markov decision processes

Domaines

Optimisation et contrôle [math.OC] Apprentissage [cs.LG] Analyse numérique [cs.NA] Recherche opérationnelle [math.OC] Complexité [cs.CC] Statistiques [math.ST] Recherche d'information [cs.IR]

Fichier principal

final.pdf (736.54 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Bruno Scherrer : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01091341

Soumis le : lundi 8 décembre 2014-12:17:02

Dernière modification le : mercredi 17 avril 2024-11:15:22

Archivage à long terme le : samedi 15 avril 2017-03:55:52

Dates et versions

hal-01091341 , version 1 (08-12-2014)

Licence

Paternité

Identifiants

HAL Id : hal-01091341 , version 1

Citer

Bruno Scherrer, Mohammad Ghavamzadeh, Victor Gabillon, Boris Lesner, Matthieu Geist. Approximate modified policy iteration and its application to the game of Tetris. Journal of Machine Learning Research, 2015, 16, pp.1629−1676. ⟨hal-01091341⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA IECN GRID5000 MALIS CRISTAL UNIV-LORRAINE INRIA2 CRISTAL-SEQUEL TDS-MACS LORIA LORIA-AIS IECLPS UNIV-LILLE SILECS

817 Consultations

645 Téléchargements

Approximate modified policy iteration and its application to the game of Tetris

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Partager