Mining visual actions from movies

Adrien Gaidon 1, 2, * Marcin Marszalek 3 Cordelia Schmid 1
* Corresponding author
1 LEAR - Learning and recognition in vision
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
Abstract : This paper presents an approach for mining visual actions from real-world videos. Given a large number of movies, we want to automatically extract short video sequences corresponding to visual human actions. Firstly, we retrieve actions by mining verbs extracted from the transcripts aligned with the videos. Not all of these samples visually characterize the action and, therefore, we rank these videos by visual consistency. We investigate two unsupervised outlier detection methods: one-class Support Vector Machine (SVM) and densest component estimation of a similarity graph. Alternatively, we show how to use automatic weak supervision provided by a random background class, either by directly applying a binary SVM, or by using an iterative re-training scheme for Support Vector Regression machines (SVR). Experimental results explore actions in 144 episodes of the TV series ''Buffy the Vampire Slayer'' and show: (a) the applicability of our approach to a large scale set of real-world videos, (b) the importance of visual consistency for ranking videos retrieved from text, (c) the added value of random non-action samples and (d) the ability of our iterative SVR re-training algorithm to handle weak supervision. The quality of the rankings obtained is assessed on manually annotated data for six different action classes.
Document type :
Conference papers
A. Cavallaro and S. Prince and D. Alexander. British Machine Vision Conference, Sep 2009, Londres, United Kingdom. BMVA Press, pp.125.1-125.11, 2009, 〈http://www.bmva.org/bmvc/2009/Papers/Paper164/Paper164.pdf〉. 〈10.5244/C.23.125〉
Liste complète des métadonnées

Cited literature [8 references]  Display  Hide  Download


https://hal.inria.fr/inria-00440973
Contributor : Thoth Team <>
Submitted on : Wednesday, April 25, 2012 - 1:50:00 PM
Last modification on : Wednesday, July 9, 2014 - 4:20:18 PM
Document(s) archivé(s) le : Tuesday, December 13, 2016 - 5:45:03 PM

Identifiers

Collections

Citation

Adrien Gaidon, Marcin Marszalek, Cordelia Schmid. Mining visual actions from movies. A. Cavallaro and S. Prince and D. Alexander. British Machine Vision Conference, Sep 2009, Londres, United Kingdom. BMVA Press, pp.125.1-125.11, 2009, 〈http://www.bmva.org/bmvc/2009/Papers/Paper164/Paper164.pdf〉. 〈10.5244/C.23.125〉. 〈inria-00440973v2〉

Share

Metrics

Record views

785

Document downloads

1179