A flexible model for training action localization with varying levels of supervision

Guilhem Chéron 1, 2 Jean-Baptiste Alayrac 1 Ivan Laptev 1 Cordelia Schmid 2
1 WILLOW - Models of visual object recognition and scene understanding
DI-ENS - Département d'informatique de l'École normale supérieure, Inria de Paris
2 Thoth - Apprentissage de modèles à partir de données massives
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann
Abstract : Spatio-temporal action detection in videos is typically addressed in a fully-supervised setup with manual annotation of training videos required at every frame. Since such annotation is extremely tedious and prohibits scalability, there is a clear need to minimize the amount of manual supervision. In this work we propose a unifying framework that can handle and combine varying types of less-demanding weak supervision. Our model is based on discriminative clustering and integrates different types of supervision as constraints on the optimization. We investigate applications of such a model to training setups with alternative supervisory signals ranging from video-level class labels to the full per-frame annotation of action bounding boxes. Experiments on the challenging UCF101-24 and DALY datasets demonstrate competitive performance of our method at a fraction of supervision used by previous methods. The flexibility of our model enables joint learning from data with different levels of annotation. Experimental results demonstrate a significant gain by adding a few fully supervised examples to otherwise weakly labeled videos.
Type de document :
Communication dans un congrès
NIPS 2018 - 32nd Conference on Neural Information Processing Systems, Dec 2018, Montréal, Canada. pp.1-17
Liste complète des métadonnées

https://hal.inria.fr/hal-01937002
Contributeur : Guilhem Chéron <>
Soumis le : mardi 27 novembre 2018 - 18:55:20
Dernière modification le : lundi 3 décembre 2018 - 11:05:55

Lien texte intégral

Identifiants

  • HAL Id : hal-01937002, version 1
  • ARXIV : 1806.11328

Collections

Citation

Guilhem Chéron, Jean-Baptiste Alayrac, Ivan Laptev, Cordelia Schmid. A flexible model for training action localization with varying levels of supervision. NIPS 2018 - 32nd Conference on Neural Information Processing Systems, Dec 2018, Montréal, Canada. pp.1-17. 〈hal-01937002〉

Partager

Métriques

Consultations de la notice

83