Skip to Main content Skip to Navigation
Theses

Large-scale Learning from Video and Natural Language

Antoine Miech 1
1 WILLOW - Models of visual object recognition and scene understanding
DI-ENS - Département d'informatique de l'École normale supérieure, Inria de Paris
Abstract : The goal of this thesis is to build and train machine learning models capable of understanding the content of videos. Current video understanding approaches mainly rely on large-scale manually annotated video datasets for training. However, collecting and annotating such dataset is cumbersome, expensive and time-consuming. To address this issue, this thesis focuses on leveraging large amounts of readily-available, but noisy annotations in the form of natural language. In particular, we exploit a diverse corpus of textual metadata such as movie scripts, web video titles and descriptions or automatically transcribed speech obtained from narrated videos. Training video models on such readily-available textual data is challenging as such annotation is often imprecise or wrong. In this thesis, we introduce learning approaches to deal with weak annotation and design specialized training objectives and neural network architectures.
Complete list of metadatas

https://hal.inria.fr/tel-03084216
Contributor : Antoine Miech <>
Submitted on : Sunday, December 20, 2020 - 9:43:25 PM
Last modification on : Tuesday, December 22, 2020 - 3:33:27 AM

File

main.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-03084216, version 1

Collections

Citation

Antoine Miech. Large-scale Learning from Video and Natural Language. Computer Vision and Pattern Recognition [cs.CV]. PSL Research University, 2020. English. ⟨tel-03084216⟩

Share

Metrics

Record views

52

Files downloads

150