HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Learning to track for spatio-temporal action localization

Philippe Weinzaepfel 1 Zaid Harchaoui 1 Cordelia Schmid 1
1 LEAR - Learning and recognition in vision
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, Grenoble INP - Institut polytechnique de Grenoble - Grenoble Institute of Technology
Abstract : We propose an effective approach for action localization, both in the spatial and temporal domains, in realistic videos. The approach starts from detecting proposals at frame-level, and proceeds to scoring them using a combination of static and motion state-of-the-art features extracted from CNNs. We then track a selection of proposals throughout the video, using a tracking-by-detection approach that leverages a combination of instance-level and class-specific learned detectors. The tracks are scored using a spatio-temporal motion histogram (STMH), a novel descriptor at the track level, in combination with the CNN features. Finally, we perform temporal localization of the action using a sliding-window approach. We present experimental results on the UCF-Sports and J-HMDB action localization datasets, where our approach outperforms the state of the art with a margin of 15% and 7% respectively in mAP. Furthermore, we present the first experimental results on the challenging UCF-101 localization dataset with 24 classes, where we also obtain a promising performance.
Complete list of metadata

Cited literature [44 references]  Display  Hide  Download

Contributor : Thoth Team Connect in order to contact the contributor
Submitted on : Thursday, October 1, 2015 - 8:58:36 AM
Last modification on : Thursday, January 20, 2022 - 5:30:20 PM
Long-term archiving on: : Wednesday, April 26, 2017 - 10:38:39 PM




Philippe Weinzaepfel, Zaid Harchaoui, Cordelia Schmid. Learning to track for spatio-temporal action localization. ICCV - IEEE International Conference on Computer Vision, Dec 2015, Santiago, Chile. pp.3164-3172, ⟨10.1109/ICCV.2015.362⟩. ⟨hal-01159941v2⟩



Record views


Files downloads