Nash and the Bandit Approach for Adversarial Portfolios

David L. Saint-Pierre; Olivier Teytaud

doi:10.1109/CIG.2014.6932897

Communication Dans Un Congrès Année : 2014

Nash and the Bandit Approach for Adversarial Portfolios

(1, 2) , (1, 2)

1
2

David L. Saint-Pierre

Fonction : Auteur

Machine Learning and Optimisation

Laboratoire de Recherche en Informatique

Olivier Teytaud

Fonction : Auteur
PersonId : 581
IdHAL : olivier-teytaud
IdRef : 05971008X

Machine Learning and Optimisation

Laboratoire de Recherche en Informatique

Résumé

—In this paper we study the use of a portfolio of policies for adversarial problems. We use two different portfolios of policies and apply it to the game of Go. The first portfolio is composed of different versions of the GnuGo agent. The second portfolio is composed of fixed random seeds. First we demonstrate that learning an offline combination of these policies using the notion of Nash Equilibrium generates a stronger opponent. Second, we show that we can learn online such distributions through a bandit approach. The advantages of our approach are (i) diversity (the Nash-Portfolio is more variable than its components) (ii) adaptivity (the Bandit-Portfolio adapts to the opponent) (iii) simplicity (no computational overhead) (iv) increased performance. Due to the importance of games on mobile devices, designing artificial intelligences for small computational power is crucial; our approach is particularly suited for mobile device since it create a stronger opponent simply by biasing the distribution over the policies and moreover it generalizes quite well.

Domaines

Optimisation et contrôle [math.OC]

Fichier principal

nashrand3.pdf (300.62 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Olivier Teytaud : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01077628

Soumis le : lundi 3 novembre 2014-08:14:41

Dernière modification le : jeudi 18 avril 2024-14:50:09

Archivage à long terme le : mercredi 4 février 2015-10:07:03

Dates et versions

hal-01077628 , version 1 (03-11-2014)

Identifiants

HAL Id : hal-01077628 , version 1
DOI : 10.1109/CIG.2014.6932897

Citer

David L. Saint-Pierre, Olivier Teytaud. Nash and the Bandit Approach for Adversarial Portfolios. CIG 2014 - Computational Intelligence in Games, IEEE, Aug 2014, Dortmund, Germany. pp.7, ⟨10.1109/CIG.2014.6932897⟩. ⟨hal-01077628⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

EC-PARIS CNRS INRIA UMR8623 INRIA2 LRI-AO TDS-MACS UNIV-PARIS-SACLAY

213 Consultations

323 Téléchargements

Nash and the Bandit Approach for Adversarial Portfolios

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager