FiLM: Visual Reasoning with a General Conditioning Layer - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

FiLM: Visual Reasoning with a General Conditioning Layer

Résumé

We introduce a general-purpose conditioning method for neu-ral networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple , feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning — answering image-related questions which require a multi-step, high-level process — a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.

Dates et versions

hal-01648685 , version 1 (28-11-2017)

Identifiants

Citer

Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, Aaron Courville. FiLM: Visual Reasoning with a General Conditioning Layer. AAAI Conference on Artificial Intelligence, Feb 2018, New Orleans, United States. ⟨hal-01648685⟩
685 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More