Generative Adversarial Networks for Multimodal Representation Learning in Video Hyperlinking

Vedran Vukotić; Christian Raymond; Guillaume Gravier

doi:10.1145/3078971.3079038

Communication Dans Un Congrès Année : 2017

Generative Adversarial Networks for Multimodal Representation Learning in Video Hyperlinking

(1) , (1) ,

Vedran Vukotić

Fonction : Auteur
PersonId : 8581
IdHAL : vvukotic

Creating and exploiting explicit links between multimedia fragments

Christian Raymond

Fonction : Auteur
PersonId : 1778
IdHAL : christian-raymond
IdRef : 099236486

Creating and exploiting explicit links between multimedia fragments

Guillaume Gravier

Fonction : Auteur
PersonId : 1046
IdHAL : guig
ORCID : 0000-0002-2266-5682
IdRef : 110355415

Résumé

Continuous multimodal representations suitable for multimodal information retrieval are usually obtained with methods that heavily rely on multimodal autoencoders. In video hyperlinking, a task that aims at retrieving video segments, the state of the art is a variation of two interlocked networks working in opposing directions. ese systems provide good multimodal embeddings and are also capable of translating from one representation space to the other. Operating on representation spaces, these networks lack the ability to operate in the original spaces (text or image), which makes it diicult to visualize the crossmodal function, and do not generalize well to unseen data. Recently, generative adversarial networks have gained popularity and have been used for generating realistic synthetic data and for obtaining high-level, single-modal latent representation spaces. In this work, we evaluate the feasibility of using GANs to obtain multimodal representations. We show that GANs can be used for multimodal representation learning and that they provide multimodal representations that are superior to representations obtained with multimodal autoencoders. Additionally, we illustrate the ability of visualizing crossmodal translations that can provide human-interpretable insights on learned GAN-based video hyperlinking models.

Mots clés

video hyperlinking multimodal fusion generative adversarial networks trecvid multimodal embedding multimodal autoencoders representation learning unsupervised learning neural networks

Domaines

Multimédia [cs.MM] Réseau de neurones [cs.NE] Recherche d'information [cs.IR]

Fichier principal

Vukotic_ICMR_2017.pdf (801.08 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Vedran Vukotić : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01522419

Soumis le : lundi 15 mai 2017-09:34:39

Dernière modification le : vendredi 24 mars 2023-14:53:04

Archivage à long terme le : jeudi 17 août 2017-00:20:17

Dates et versions

hal-01522419 , version 1 (15-05-2017)

Identifiants

HAL Id : hal-01522419 , version 1
DOI : 10.1145/3078971.3079038

Citer

Vedran Vukotić, Christian Raymond, Guillaume Gravier. Generative Adversarial Networks for Multimodal Representation Learning in Video Hyperlinking. ACM International Conference on Multimedia Retrieval (ICMR) 2017, ACM, Jun 2017, Bucharest, Romania. ⟨10.1145/3078971.3079038⟩. ⟨hal-01522419⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA IRISA-INSA-R INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

740 Consultations

514 Téléchargements

Generative Adversarial Networks for Multimodal Representation Learning in Video Hyperlinking

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager