DEEP FEATURES FOR MULTIMODAL EMOTION CLASSIFICATION

Abstract : Understanding human emotion when perceiving audiovisual content is an exciting and important research avenue. Thus, there have been emerging attempts to predict the emotion elicited by video clips or movies recently. While most existing approaches focus either on single modality, i. e., only audio or visual data is exploited, or build on a multimodal scheme with late fusion , we propose a multimodal framework with early fusion scheme and target an emotion classification task. Our proposed mechanism presents the advantages of handling (1) the variation in video length, (2) the imbalance of audio and visual feature sizes, and (3) the middle-level fusion of audio and visual information such that a higher level feature representation can be learned jointly from the two modalities for classification. We evaluate the performance of the proposed approach on the international benchmark, i. e., the MediaEval 2015 Affective Impact of Movies 1 task , and show that it outperforms most state-of-the-art systems on arousal accuracy while using a much smaller feature size .
Liste complète des métadonnées

Littérature citée [24 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01289191
Contributeur : Ngoc Duong <>
Soumis le : mercredi 16 mars 2016 - 11:53:16
Dernière modification le : lundi 20 juin 2016 - 09:33:14
Document(s) archivé(s) le : dimanche 13 novembre 2016 - 19:44:16

Fichier

ICIPpaper.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01289191, version 1

Collections

Citation

Shriman Narayan Tiwari, Ngoc Q. K. Duong, Frédéric Lefebvre, Claire-Helène Demarty, Benoit Huet, et al.. DEEP FEATURES FOR MULTIMODAL EMOTION CLASSIFICATION. 2016. 〈hal-01289191〉

Partager

Métriques

Consultations de la notice

236

Téléchargements de fichiers

370