Nash and the Bandit Approach for Adversarial Portfolios

David L. Saint-Pierre 1, 2 Olivier Teytaud 1, 2
1 TAO - Machine Learning and Optimisation
CNRS - Centre National de la Recherche Scientifique : UMR8623, Inria Saclay - Ile de France, UP11 - Université Paris-Sud - Paris 11, LRI - Laboratoire de Recherche en Informatique
Abstract : —In this paper we study the use of a portfolio of policies for adversarial problems. We use two different portfolios of policies and apply it to the game of Go. The first portfolio is composed of different versions of the GnuGo agent. The second portfolio is composed of fixed random seeds. First we demonstrate that learning an offline combination of these policies using the notion of Nash Equilibrium generates a stronger opponent. Second, we show that we can learn online such distributions through a bandit approach. The advantages of our approach are (i) diversity (the Nash-Portfolio is more variable than its components) (ii) adaptivity (the Bandit-Portfolio adapts to the opponent) (iii) simplicity (no computational overhead) (iv) increased performance. Due to the importance of games on mobile devices, designing artificial intelligences for small computational power is crucial; our approach is particularly suited for mobile device since it create a stronger opponent simply by biasing the distribution over the policies and moreover it generalizes quite well.
Type de document :
Communication dans un congrès
CIG 2014 - Computational Intelligence in Games, Aug 2014, Dortmund, Germany. IEEE, pp.7, 2014, Computational Intelligence in Games. 〈10.1109/CIG.2014.6932897〉
Liste complète des métadonnées

Littérature citée [28 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01077628
Contributeur : Olivier Teytaud <>
Soumis le : lundi 3 novembre 2014 - 08:14:41
Dernière modification le : jeudi 5 avril 2018 - 12:30:12
Document(s) archivé(s) le : mercredi 4 février 2015 - 10:07:03

Fichier

nashrand3.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

David L. Saint-Pierre, Olivier Teytaud. Nash and the Bandit Approach for Adversarial Portfolios. CIG 2014 - Computational Intelligence in Games, Aug 2014, Dortmund, Germany. IEEE, pp.7, 2014, Computational Intelligence in Games. 〈10.1109/CIG.2014.6932897〉. 〈hal-01077628〉

Partager

Métriques

Consultations de la notice

263

Téléchargements de fichiers

232