Bidirectional Joint Representation Learning with Symmetrical Deep Neural Networks for Multimodal and Crossmodal Applications

Vedran Vukotic 1 Christian Raymond 1 Guillaume Gravier 1
1 LinkMedia - Creating and exploiting explicit links between multimedia fragments
IRISA-D6 - MEDIA ET INTERACTIONS, Inria Rennes – Bretagne Atlantique
Abstract : Common approaches to problems involving multiple modalities (classification, retrieval, hyperlinking, etc.) are early fusion of the initial modalities and crossmodal translation from one modality to the other. Recently, deep neural networks, especially deep autoencoders, have proven promising both for crossmodal translation and for early fusion via multimodal embedding. In this work, we propose a flexible cross-modal deep neural network architecture for multimodal and crossmodal representation. By tying the weights of two deep neural networks, symmetry is enforced in central hidden layers thus yielding a multimodal representation space common to the two original representation spaces. The proposed architecture is evaluated in multimodal query expansion and multimodal retrieval tasks within the context of video hyperlinking. Our method demonstrates improved crossmodal translation capabilities and produces a multimodal embedding that significantly outperforms multimodal embeddings obtained by deep autoencoders, resulting in an absolute increase of 14.14 in precision at 10 on a video hyperlinking task.
Type de document :
Communication dans un congrès
ICMR, Jun 2016, New York, United States. 2016
Liste complète des métadonnées


https://hal.inria.fr/hal-01314302
Contributeur : Vedran Vukotić <>
Soumis le : mercredi 11 mai 2016 - 10:41:38
Dernière modification le : mercredi 2 août 2017 - 10:06:59
Document(s) archivé(s) le : mercredi 16 novembre 2016 - 00:30:47

Fichier

vukotic_BiDNN.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01314302, version 1

Citation

Vedran Vukotic, Christian Raymond, Guillaume Gravier. Bidirectional Joint Representation Learning with Symmetrical Deep Neural Networks for Multimodal and Crossmodal Applications. ICMR, Jun 2016, New York, United States. 2016. <hal-01314302>

Partager

Métriques

Consultations de
la notice

751

Téléchargements du document

512