Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

DEEP FEATURES FOR MULTIMODAL EMOTION CLASSIFICATION

Abstract : Understanding human emotion when perceiving audiovisual content is an exciting and important research avenue. Thus, there have been emerging attempts to predict the emotion elicited by video clips or movies recently. While most existing approaches focus either on single modality, i. e., only audio or visual data is exploited, or build on a multimodal scheme with late fusion , we propose a multimodal framework with early fusion scheme and target an emotion classification task. Our proposed mechanism presents the advantages of handling (1) the variation in video length, (2) the imbalance of audio and visual feature sizes, and (3) the middle-level fusion of audio and visual information such that a higher level feature representation can be learned jointly from the two modalities for classification. We evaluate the performance of the proposed approach on the international benchmark, i. e., the MediaEval 2015 Affective Impact of Movies 1 task , and show that it outperforms most state-of-the-art systems on arousal accuracy while using a much smaller feature size .
Complete list of metadatas

Cited literature [24 references]  Display  Hide  Download

https://hal.inria.fr/hal-01289191
Contributor : Ngoc Duong <>
Submitted on : Wednesday, March 16, 2016 - 11:53:16 AM
Last modification on : Monday, June 20, 2016 - 9:33:14 AM
Document(s) archivé(s) le : Sunday, November 13, 2016 - 7:44:16 PM

File

ICIPpaper.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01289191, version 1

Collections

Citation

Shriman Narayan Tiwari, Ngoc Q. K. Duong, Frédéric Lefebvre, Claire-Helène Demarty, Benoit Huet, et al.. DEEP FEATURES FOR MULTIMODAL EMOTION CLASSIFICATION. 2016. ⟨hal-01289191⟩

Share

Metrics

Record views

327

Files downloads

509