Learning Semantic Segmentation with Weakly-Annotated Videos - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2016

Learning Semantic Segmentation with Weakly-Annotated Videos

Résumé

Fully convolutional neural networks (FCNNs) trained on a large number of images with strong pixel-level annotations have become the new state of the art for the semantic segmentation task. While there have been recent attempts to learn FCNNs from image-level weak annotations , they need additional constraints, such as the size of an object, to obtain reasonable performance. To address this issue, we present motion-CNN (M-CNN), a novel FCNN framework which incorporates motion cues and is learned from video-level weak annotations. Our learning scheme to train the network uses motion segments as soft constraints, thereby handling noisy motion information. When trained on weakly-annotated videos, our method outperforms the state-of-the-art EM-Adapt approach on the PASCAL VOC 2012 image segmentation benchmark. We also demonstrate that the performance of M-CNN learned with 150 weak video annotations is on par with state-of-the-art weakly-supervised methods trained with thousands of images. Finally, M-CNN substantially outperforms recent approaches in a related task of video co-localization on the YouTube-Objects dataset. This is an extended version of our ECCV 2016 paper.
Fichier principal
Vignette du fichier
mcnn.pdf (2.49 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01292794 , version 1 (23-03-2016)
hal-01292794 , version 2 (28-07-2016)
hal-01292794 , version 3 (02-08-2016)

Identifiants

Citer

Pavel Tokmakov, Karteek Alahari, Cordelia Schmid. Learning Semantic Segmentation with Weakly-Annotated Videos. 2016. ⟨hal-01292794v2⟩
1774 Consultations
1863 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More