FiLM: Visual Reasoning with a General Conditioning Layer - Archive ouverte HAL Access content directly
Conference Papers Year :

FiLM: Visual Reasoning with a General Conditioning Layer

(1, 2) , (3, 4) , (2) , (2) , (2)
1
2
3
4

Abstract

We introduce a general-purpose conditioning method for neu-ral networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple , feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning — answering image-related questions which require a multi-step, high-level process — a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.

Dates and versions

hal-01648685 , version 1 (28-11-2017)

Identifiers

Cite

Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, Aaron Courville. FiLM: Visual Reasoning with a General Conditioning Layer. AAAI Conference on Artificial Intelligence, Feb 2018, New Orleans, United States. ⟨hal-01648685⟩
582 View
0 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More