Learning to track for spatio-temporal action localization

Philippe Weinzaepfel 1 Zaid Harchaoui 1 Cordelia Schmid 1
1 LEAR - Learning and recognition in vision
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
Abstract : We propose an effective approach for action localization, both in the spatial and temporal domains, in realistic videos. The approach starts from detecting proposals at frame-level, and proceeds to scoring them using a combination of static and motion state-of-the-art features extracted from CNNs. We then track a selection of proposals throughout the video, using a tracking-by-detection approach that leverages a combination of instance-level and class-specific learned detectors. The tracks are scored using a spatio-temporal motion histogram (STMH), a novel descriptor at the track level, in combination with the CNN features. Finally, we perform temporal localization of the action using a sliding-window approach. We present experimental results on the UCF-Sports and J-HMDB action localization datasets, where our approach outperforms the state of the art with a margin of 15% and 7% respectively in mAP. Furthermore, we present the first experimental results on the challenging UCF-101 localization dataset with 24 classes, where we also obtain a promising performance.
Complete list of metadatas

Cited literature [44 references]  Display  Hide  Download


https://hal.inria.fr/hal-01159941
Contributor : Thoth Team <>
Submitted on : Thursday, October 1, 2015 - 8:58:36 AM
Last modification on : Monday, December 17, 2018 - 11:22:02 AM
Long-term archiving on : Wednesday, April 26, 2017 - 10:38:39 PM

Identifiers

Citation

Philippe Weinzaepfel, Zaid Harchaoui, Cordelia Schmid. Learning to track for spatio-temporal action localization. ICCV - IEEE International Conference on Computer Vision, Dec 2015, Santiago, Chile. pp.3164-3172, ⟨10.1109/ICCV.2015.362⟩. ⟨hal-01159941v2⟩

Share

Metrics

Record views

2028

Files downloads

4269