Online Sparse bandit for Card Games

David Saint-Pierre 1, 2, 3 Quentin Louveaux 1 Olivier Teytaud 2, 3
3 TAO - Machine Learning and Optimisation
CNRS - Centre National de la Recherche Scientifique : UMR8623, Inria Saclay - Ile de France, UP11 - Université Paris-Sud - Paris 11, LRI - Laboratoire de Recherche en Informatique
Abstract : Finding an approximation of a Nash equilibria in matrix games is an important topic that reaches beyond the strict application to matrix games. A bandit algorithm commonly used to approximate a Nash equilibrium is EXP3 [?]. However, the solution to many problems is often sparse, yet EXP3 inherently fails to exploit this property. To the knowledge of the authors, there exist only an offline truncation pro-posed by [?] to tackle such issue. In this paper, we propose a variation of EXP3 to exploit the fact that solution is sparse by dynamically removing arms; the resulting algorithm empirically performs better than previous versions. We apply the resulting algorithm to a MCTS program for the Urban Rivals card game.
Type de document :
Communication dans un congrès
Advances in Computer Games, Nov 2011, Tilburg, France. 2012
Liste complète des métadonnées

https://hal.inria.fr/hal-01116714
Contributeur : Olivier Teytaud <>
Soumis le : mardi 17 février 2015 - 09:26:32
Dernière modification le : jeudi 5 avril 2018 - 12:30:12
Document(s) archivé(s) le : lundi 18 mai 2015 - 10:05:46

Fichier

onlinesparse.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01116714, version 1

Collections

Citation

David Saint-Pierre, Quentin Louveaux, Olivier Teytaud. Online Sparse bandit for Card Games. Advances in Computer Games, Nov 2011, Tilburg, France. 2012. 〈hal-01116714〉

Partager

Métriques

Consultations de la notice

264

Téléchargements de fichiers

122