Human Focused Action Localization in Video - Archive ouverte HAL Access content directly
Conference Papers Year : 2010

Human Focused Action Localization in Video

Marcin Marszałek
  • Function : Author
  • PersonId : 878442
Cordelia Schmid
  • Function : Author
  • PersonId : 831154
Andrew Zisserman
  • Function : Author
  • PersonId : 878447


We propose a novel human-centric approach to detect and localize human actions in challenging video data, such as Hollywood movies. Our goal is to localize actions in time through the video and spatially in each frame. We achieve this by first obtaining generic spatio-temporal human tracks and then detecting specific actions within these using a sliding window classifier.

We make the following contributions: (i) We show that splitting the action localization task into spatial and temporal search leads to an efficient localization algorithm where generic human tracks can be reused to recognize multiple human actions; (ii) We develop a human detector and tracker which is able to cope with a wide range of postures, articulations, motions and camera viewpoints. The tracker includes detection interpolation and a principled classification stage to suppress false positive tracks; (iii) We propose a track-aligned 3D-HOG action representation, investigate its parameters, and show that action localization benefits from using tracks; and (iv) We introduce a new action localization dataset based on Hollywood movies.

Results are presented on a number of real-world movies with crowded, dynamic environment, partial occlusion and cluttered background. On the Coffee&Cigarettes dataset we significantly improve over the state of the art. Furthermore, we obtain excellent results on the new Hollywood-Localization dataset.

Vignette du fichier
drinking.png (187.06 Ko) Télécharger le fichier Fichier principal
Vignette du fichier
paper.pdf (813.73 Ko) Télécharger le fichier
KlaserMarszalekSchmidZisserman-SGA10-ActionLocalization-slides.pdf (1.34 Mo) Télécharger le fichier
Format : Figure, Image
Origin : Files produced by the author(s)
Format : Other

Dates and versions

inria-00514845 , version 1 (03-09-2010)



Alexander Klaser, Marcin Marszałek, Cordelia Schmid, Andrew Zisserman. Human Focused Action Localization in Video. SGA 2010 - International Workshop on Sign, Gesture, and Activity, ECCV 2010 Workshops, Sep 2010, Hersonissos, Heraklion, Crete, Greece. pp.219-233, ⟨10.1007/978-3-642-35749-7_17⟩. ⟨inria-00514845⟩
465 View
624 Download



Gmail Facebook Twitter LinkedIn More