Approximate Modified Policy Iteration and its Application to the Game of Tetris

Bruno Scherrer 1, 2, 3 Mohammad Ghavamzadeh 4 Victor Gabillon 4 Boris Lesner 1 Matthieu Geist 5
1 MAIA - Autonomous intelligent machine
Inria Nancy - Grand Est, LORIA - AIS - Department of Complex Systems, Artificial Intelligence & Robotics
2 BIGS - Biology, genetics and statistics
Inria Nancy - Grand Est, IECL - Institut Élie Cartan de Lorraine
3 Probabilités et statistiques
IECL - Institut Élie Cartan de Lorraine
4 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
5 IMS - Equipe Information, Multimodalité et Signal
UMI2958 - Georgia Tech - CNRS [Metz], SUPELEC-Campus Metz
Abstract : Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are extensions of the well-known approximate DP algorithms:~fitted-value iteration, fitted-Q iteration, and classification-based policy iteration. We provide error propagation analysis that unify those for approximate policy and value iteration. We develop the finite-sample analysis of these algorithms, which highlights the influence of their parameters. In the classification-based version of the algorithm (CBMPI), the analysis shows that MPI's main parameter controls the balance between the estimation error of the classifier and the overall value function approximation. We illustrate and evaluate the behavior of these new algorithms in the Mountain Car and Tetris problems. Remarkably, in Tetris, CBMPI outperforms the existing DP approaches by a large margin, and competes with the current state-of-the-art methods while using fewer samples.
Liste complète des métadonnées

https://hal.inria.fr/hal-01091341
Contributeur : Bruno Scherrer <>
Soumis le : lundi 8 décembre 2014 - 12:17:02
Dernière modification le : mardi 3 juillet 2018 - 11:48:41
Document(s) archivé(s) le : samedi 15 avril 2017 - 03:55:52

Fichiers

final.pdf
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

  • HAL Id : hal-01091341, version 1

Citation

Bruno Scherrer, Mohammad Ghavamzadeh, Victor Gabillon, Boris Lesner, Matthieu Geist. Approximate Modified Policy Iteration and its Application to the Game of Tetris. Journal of Machine Learning Research, Journal of Machine Learning Research, 2015, 16, pp.1629−1676. 〈hal-01091341〉

Partager

Métriques

Consultations de la notice

993

Téléchargements de fichiers

501