DEEP FEATURES FOR MULTIMODAL EMOTION CLASSIFICATION - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2016

DEEP FEATURES FOR MULTIMODAL EMOTION CLASSIFICATION

Résumé

Understanding human emotion when perceiving audiovisual content is an exciting and important research avenue. Thus, there have been emerging attempts to predict the emotion elicited by video clips or movies recently. While most existing approaches focus either on single modality, i. e., only audio or visual data is exploited, or build on a multimodal scheme with late fusion , we propose a multimodal framework with early fusion scheme and target an emotion classification task. Our proposed mechanism presents the advantages of handling (1) the variation in video length, (2) the imbalance of audio and visual feature sizes, and (3) the middle-level fusion of audio and visual information such that a higher level feature representation can be learned jointly from the two modalities for classification. We evaluate the performance of the proposed approach on the international benchmark, i. e., the MediaEval 2015 Affective Impact of Movies 1 task , and show that it outperforms most state-of-the-art systems on arousal accuracy while using a much smaller feature size .
Fichier principal
Vignette du fichier
ICIPpaper.pdf (268.29 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01289191 , version 1 (16-03-2016)

Identifiants

  • HAL Id : hal-01289191 , version 1

Citer

Shriman Narayan Tiwari, Ngoc Q. K. Duong, Frédéric Lefebvre, Claire-Helène Demarty, Benoit Huet, et al.. DEEP FEATURES FOR MULTIMODAL EMOTION CLASSIFICATION. 2016. ⟨hal-01289191⟩

Collections

EURECOM
267 Consultations
401 Téléchargements

Partager

Gmail Facebook X LinkedIn More