GuessWhat?! Visual object discovery through multi-modal dialogue

Harm de Vries; Florian Strub; Sarath Chandar; Olivier Pietquin; Hugo Larochelle; Aaron Courville

Communication Dans Un Congrès Année : 2017

GuessWhat?! Visual object discovery through multi-modal dialogue

(1) , (2, 3) , (1) , (2, 3) , (4) , (1)

1
2
3
4

Harm de Vries

Fonction : Auteur

Université de Montréal

Florian Strub

Fonction : Auteur
PersonId : 18649
IdHAL : florian-strub
ORCID : 0000-0001-7271-5345

Sequential Learning

Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189

Sarath Chandar

Fonction : Auteur

Université de Montréal

Olivier Pietquin

Fonction : Auteur
PersonId : 4024
IdHAL : olivier-pietquin
ORCID : 0000-0002-5386-465X
IdRef : 142821861

Sequential Learning

Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189

Hugo Larochelle

Fonction : Auteur

Twitter Inc

Aaron Courville

Fonction : Auteur
PersonId : 1011047

Université de Montréal

Résumé

We introduce GuessWhat?!, a two-player guessing game as a testbed for research on the interplay of computer vision and dialogue systems. The goal of the game is to locate an unknown object in a rich image scene by asking a sequence of questions. Higher-level image understanding, like spatial reasoning and language grounding, is required to solve the proposed task. Our key contribution is the collection of a large-scale dataset consisting of 150K human-played games with a total of 800K visual question-answer pairs on 66K images. We explain our design decisions in collecting the dataset and introduce the oracle and questioner tasks that are associated with the two players of the game. We prototyped deep learning models to establish initial base-lines of the introduced tasks.

Mots clés

Deep Learning Computer vision Dialog Systems

Domaines

Réseau de neurones [cs.NE] Intelligence artificielle [cs.AI]

Fichier principal

1611.08481.pdf (18.94 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Florian STRUB : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01549641

Soumis le : mercredi 28 juin 2017-23:35:00

Dernière modification le : mercredi 24 janvier 2024-09:54:23

Archivage à long terme le : jeudi 18 janvier 2018-02:54:39

Dates et versions

hal-01549641 , version 1 (28-06-2017)

Licence

Paternité

Identifiants

HAL Id : hal-01549641 , version 1
ARXIV : 1611.08481

Citer

Harm de Vries, Florian Strub, Sarath Chandar, Olivier Pietquin, Hugo Larochelle, et al.. GuessWhat?! Visual object discovery through multi-modal dialogue. Conference on Computer Vision and Pattern Recognition, Jul 2017, Honolulu, United States. ⟨hal-01549641⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA CRISTAL INRIA2 CRISTAL-SEQUEL UNIV-LILLE

344 Consultations

161 Téléchargements

GuessWhat?! Visual object discovery through multi-modal dialogue

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager