Skip to Main content Skip to Navigation
Conference papers

Human Focused Action Localization in Video

Alexander Klaser 1 Marcin Marszałek 2 Cordelia Schmid 1 Andrew Zisserman 2
1 LEAR - Learning and recognition in vision
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, Grenoble INP - Institut polytechnique de Grenoble - Grenoble Institute of Technology
Abstract :

We propose a novel human-centric approach to detect and localize human actions in challenging video data, such as Hollywood movies. Our goal is to localize actions in time through the video and spatially in each frame. We achieve this by first obtaining generic spatio-temporal human tracks and then detecting specific actions within these using a sliding window classifier.

We make the following contributions: (i) We show that splitting the action localization task into spatial and temporal search leads to an efficient localization algorithm where generic human tracks can be reused to recognize multiple human actions; (ii) We develop a human detector and tracker which is able to cope with a wide range of postures, articulations, motions and camera viewpoints. The tracker includes detection interpolation and a principled classification stage to suppress false positive tracks; (iii) We propose a track-aligned 3D-HOG action representation, investigate its parameters, and show that action localization benefits from using tracks; and (iv) We introduce a new action localization dataset based on Hollywood movies.

Results are presented on a number of real-world movies with crowded, dynamic environment, partial occlusion and cluttered background. On the Coffee&Cigarettes dataset we significantly improve over the state of the art. Furthermore, we obtain excellent results on the new Hollywood-Localization dataset.

Document type :
Conference papers
Complete list of metadata
Contributor : Alexander Klaser Connect in order to contact the contributor
Submitted on : Friday, September 3, 2010 - 1:59:16 PM
Last modification on : Thursday, January 20, 2022 - 5:30:16 PM
Long-term archiving on: : Tuesday, October 23, 2012 - 3:30:46 PM




Alexander Klaser, Marcin Marszałek, Cordelia Schmid, Andrew Zisserman. Human Focused Action Localization in Video. SGA 2010 - International Workshop on Sign, Gesture, and Activity, ECCV 2010 Workshops, Sep 2010, Hersonissos, Heraklion, Crete, Greece. pp.219-233, ⟨10.1007/978-3-642-35749-7_17⟩. ⟨inria-00514845⟩



Les métriques sont temporairement indisponibles