Mining visual actions from movies

Adrien Gaidon; Marcin Marszalek; Cordelia Schmid

doi:10.5244/C.23.125

Communication Dans Un Congrès Année : 2009

Mining visual actions from movies

(1, 2) , (3) , (1)

1
2
3

Adrien Gaidon

Fonction : Auteur correspondant
PersonId : 865483

Connectez-vous pour contacter l'auteur

Learning and recognition in vision

Microsoft Research - Inria Joint Centre

Marcin Marszalek

Fonction : Auteur

Computing Science Laboratory - Oxford University

Cordelia Schmid

Fonction : Auteur
PersonId : 831154

Learning and recognition in vision

Résumé

This paper presents an approach for mining visual actions from real-world videos. Given a large number of movies, we want to automatically extract short video sequences corresponding to visual human actions. Firstly, we retrieve actions by mining verbs extracted from the transcripts aligned with the videos. Not all of these samples visually characterize the action and, therefore, we rank these videos by visual consistency. We investigate two unsupervised outlier detection methods: one-class Support Vector Machine (SVM) and densest component estimation of a similarity graph. Alternatively, we show how to use automatic weak supervision provided by a random background class, either by directly applying a binary SVM, or by using an iterative re-training scheme for Support Vector Regression machines (SVR). Experimental results explore actions in 144 episodes of the TV series ''Buffy the Vampire Slayer'' and show: (a) the applicability of our approach to a large scale set of real-world videos, (b) the importance of visual consistency for ranking videos retrieved from text, (c) the added value of random non-action samples and (d) the ability of our iterative SVR re-training algorithm to handle weak supervision. The quality of the rankings obtained is assessed on manually annotated data for six different action classes.

Mots clés

LEAR MSR-INRIA human actions visual consistency iter-SVR videos movies Buffy action recognition retrieval ranking

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV] Apprentissage [cs.LG]

Fichier principal

gaidon_mining_actions_bmvc2009.pdf (1.6 Mo)

top_punch_02.png (452.01 Ko)

one_pager_mining_actions_bmvc09.pdf (1.25 Mo)

poster_buffy_BMVC09.png (11.87 Mo)

top_10_fall.avi (7.74 Mo)

top_10_get_up.avi (7.12 Mo)

top_10_walk.avi (4.7 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Format : Figure, Image

Format : Autre

THOTH Team : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00440973

Soumis le : mercredi 25 avril 2012-13:50:00

Dernière modification le : jeudi 4 avril 2024-18:17:53

Archivage à long terme le : mardi 13 décembre 2016-17:45:03

Dates et versions

inria-00440973 , version 1 (14-12-2009)

inria-00440973 , version 2 (25-04-2012)

Identifiants

HAL Id : inria-00440973 , version 2
DOI : 10.5244/C.23.125

Citer

Adrien Gaidon, Marcin Marszalek, Cordelia Schmid. Mining visual actions from movies. British Machine Vision Conference, British Machine Vision Association, Sep 2009, Londres, United Kingdom. pp.125.1-125.11, ⟨10.5244/C.23.125⟩. ⟨inria-00440973v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UGA CNRS INRIA IRISA LJK LJK_GI LJK_GI_LEAR INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

670 Consultations

858 Téléchargements

Mining visual actions from movies

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager