Localizing Moments in Video with Natural Language

Lisa Anne Hendricks; Oliver Wang; Eli Shechtman; Josef Sivic; Trevor Darrell; Bryan Russell

Communication Dans Un Congrès Année : 2017

Localizing Moments in Video with Natural Language

(1, 2) , (1) , (1) , (3, 4, 1, 5) , (2) , (1)

1
2
3
4
5

Lisa Anne Hendricks

Fonction : Auteur

Adobe Research

Lawrence Berkeley National Laboratory [Berkeley]

Oliver Wang

Fonction : Auteur

Adobe Research

Eli Shechtman

Fonction : Auteur

Adobe Research

Josef Sivic

Fonction : Auteur
PersonId : 945630

Models of visual object recognition and scene understanding

Université Paris Sciences et Lettres

Adobe Research

Czech Institute of Informatics, Robotics and Cybernetics [Prague]

Trevor Darrell

Fonction : Auteur
PersonId : 1001833

Lawrence Berkeley National Laboratory [Berkeley]

Bryan Russell

Fonction : Auteur

Adobe Research

Résumé

We consider retrieving a specific temporal segment, or moment, from a video given a natural language text description. Methods designed to retrieve whole video clips with natural language determine what occurs in a video but not when. To address this issue, we propose the Moment Context Network (MCN) which effectively localizes natural language queries in videos by integrating local and global video features over time. A key obstacle to training our MCN model is that current video datasets do not include pairs of localized video segments and referring expressions, or text descriptions which uniquely identify a corresponding moment. Therefore, we collect the Distinct Describable Moments (DiDeMo) dataset which consists of over 10,000 unedited, personal videos in diverse visual settings with pairs of localized video segments and referring expressions. We demonstrate that MCN outperforms several baseline methods and believe that our initial results together with the release of DiDeMo will inspire further research on localizing video moments with natural language.

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV]

Josef Sivic : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01678699

Soumis le : mardi 9 janvier 2018-13:05:52

Dernière modification le : vendredi 19 avril 2024-16:18:58

Dates et versions

hal-01678699 , version 1 (09-01-2018)

Identifiants

HAL Id : hal-01678699 , version 1
ARXIV : 1708.01641

Citer

Lisa Anne Hendricks, Oliver Wang, Eli Shechtman, Josef Sivic, Trevor Darrell, et al.. Localizing Moments in Video with Natural Language. IEEE International Conference on Computer Vision - ICCV 2017, Oct 2017, Venice, Italy. ⟨hal-01678699⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS UNIV-RENNES1 CNRS INRIA IRISA INRIA2 PSL UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

295 Consultations

0 Téléchargements

Localizing Moments in Video with Natural Language

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager