FiLM: Visual Reasoning with a General Conditioning Layer

Abstract : We introduce a general-purpose conditioning method for neu-ral networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple , feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning — answering image-related questions which require a multi-step, high-level process — a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.
Type de document :
Communication dans un congrès, Feb 2018, New Orleans, United States
Liste complète des métadonnées

Littérature citée [40 références]  Voir  Masquer  Télécharger
Contributeur : Florian Strub <>
Soumis le : mardi 28 novembre 2017 - 03:13:55
Dernière modification le : jeudi 11 janvier 2018 - 06:27:32


  • HAL Id : hal-01648685, version 1
  • ARXIV : 1707.03017



Ethan Perez, Florian Strub, Harm De Vries, Vincent Dumoulin, Aaron Courville. FiLM: Visual Reasoning with a General Conditioning Layer., Feb 2018, New Orleans, United States. 〈hal-01648685〉



Consultations de la notice