Approximate Dynamic Programming Finally Performs Well in the Game of Tetris

Victor Gabillon; Mohammad Ghavamzadeh; Bruno Scherrer

Communication Dans Un Congrès Année : 2013

Approximate Dynamic Programming Finally Performs Well in the Game of Tetris

(1) , (1) , (2)

1
2

Victor Gabillon

Fonction : Auteur
PersonId : 925091

Sequential Learning

Mohammad Ghavamzadeh

Fonction : Auteur
PersonId : 868946

Sequential Learning

Bruno Scherrer

Fonction : Auteur
PersonId : 1406
IdHAL : bruno-scherrer
IdRef : 073360708

Autonomous intelligent machine

Résumé

Tetris is a video game that has been widely used as a benchmark for various optimization techniques including approximate dynamic programming (ADP) algorithms. A look at the literature of this game shows that while ADP algorithms that have been (almost) entirely based on approximating the value function (value function based) have performed poorly in Tetris, the methods that search directly in the space of policies by learning the policy parameters using an optimization black box, such as the cross entropy (CE) method, have achieved the best reported results. This makes us conjecture that Tetris is a game in which good policies are easier to represent, and thus, learn than their corresponding value functions. So, in order to obtain a good performance with ADP, we should use ADP algorithms that search in a policy space, instead of the more traditional ones that search in a value function space. In this paper, we put our conjecture to test by applying such an ADP algorithm, called classification-based modified policy iteration (CBMPI), to the game of Tetris. Our experimental results show that for the first time an ADP algorithm, namely CBMPI, obtains the best results reported in the literature for Tetris in both small $10\times 10$ and large $10\times 20$ boards. Although the CBMPI's results are similar to those of the CE method in the large board, CBMPI uses considerably fewer (almost 1/6) samples (calls to the generative model) than CE.

Domaines

Apprentissage [cs.LG] Recherche opérationnelle [math.OC] Optimisation et contrôle [math.OC]

Fichier principal

tetris.pdf (186.25 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Bruno Scherrer : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00921250

Soumis le : vendredi 20 décembre 2013-10:11:11

Dernière modification le : jeudi 1 février 2024-10:04:26

Archivage à long terme le : vendredi 21 mars 2014-10:00:42

Dates et versions

hal-00921250 , version 1 (20-12-2013)

Identifiants

HAL Id : hal-00921250 , version 1

Citer

Victor Gabillon, Mohammad Ghavamzadeh, Bruno Scherrer. Approximate Dynamic Programming Finally Performs Well in the Game of Tetris. Neural Information Processing Systems (NIPS) 2013, Dec 2013, South Lake Tahoe, United States. ⟨hal-00921250⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UNIV-LILLE3 CNRS INRIA IRISA INSMI LAGIS UNIV-LORRAINE INRIA2 TDS-MACS LORIA LORIA-AIS UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

582 Consultations

1247 Téléchargements

Approximate Dynamic Programming Finally Performs Well in the Game of Tetris

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager