Learning Visual Reasoning Without Strong Priors

Ethan Perez; Harm de Vries; Florian Strub; Vincent Dumoulin; Aaron Courville

Communication Dans Un Congrès Année : 2017

Learning Visual Reasoning Without Strong Priors

(1, 2) , (2) , (3, 4) , (2) , (2)

1
2
3
4

Ethan Perez

Fonction : Auteur
PersonId : 1023763

Rice University [Houston]

Université de Montréal

Harm de Vries

Fonction : Auteur

Université de Montréal

Florian Strub

Fonction : Auteur
PersonId : 18649
IdHAL : florian-strub
ORCID : 0000-0001-7271-5345

Sequential Learning

Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189

Vincent Dumoulin

Fonction : Auteur
PersonId : 1023764

Université de Montréal

Aaron Courville

Fonction : Auteur
PersonId : 1011047

Université de Montréal

Résumé

Achieving artificial visual reasoning — the ability to answer image-related questions which require a multi-step, high-level process — is an important step towards artificial general intelligence. This multi-modal task requires learning a question-dependent, structured reasoning process over images from language. Standard deep learning approaches tend to exploit biases in the data rather than learn this underlying structure, while leading methods learn to visually reason successfully but are hand-crafted for reasoning. We show that a general-purpose, Conditional Batch Normalization approach achieves state-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4% error rate. We outperform the next best end-to-end method (4.5%) and even methods that use extra supervision (3.1%). We probe our model to shed light on how it reasons, showing it has learned a question-dependent, multi-step process. Previous work has operated under the assumption that visual reasoning calls for a specialized architecture, but we show that a general architecture with proper conditioning can learn to visually reason effectively. Index Terms: Deep Learning, Language and Vision Note: A full paper extending this study is available at http: //arxiv.org/abs/1709.07871, with additional references , experiments, and analysis.

Domaines

Réseau de neurones [cs.NE] Intelligence artificielle [cs.AI]

Florian STRUB : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01648684

Soumis le : mardi 28 novembre 2017-03:13:35

Dernière modification le : mercredi 24 janvier 2024-09:54:23

Dates et versions

hal-01648684 , version 1 (28-11-2017)

Identifiants

HAL Id : hal-01648684 , version 1
ARXIV : 1709.07871

Citer

Ethan Perez, Harm de Vries, Florian Strub, Vincent Dumoulin, Aaron Courville. Learning Visual Reasoning Without Strong Priors. ICML 2017's Machine Learning in Speech and Language Processing Workshop, Aug 2017, Sidney, France. ⟨hal-01648684⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA CRISTAL INRIA2 CRISTAL-SEQUEL UNIV-LILLE

144 Consultations

0 Téléchargements

Learning Visual Reasoning Without Strong Priors

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager