Multi-region two-stream R-CNN for action detection

Xiaojiang Peng 1 Cordelia Schmid 1
1 Thoth - Apprentissage de modèles à partir de données massives
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann
Abstract : We propose a multi-region two-stream R-CNN model for action detection in realistic videos. We start from frame-level action detection based on faster R-CNN [1], and make three contributions: (1) we show that a motion region proposal network generates high-quality proposals , which are complementary to those of an appearance region proposal network; (2) we show that stacking optical flow over several frames significantly improves frame-level action detection; and (3) we embed a multi-region scheme in the faster R-CNN model, which adds complementary information on body parts. We then link frame-level detections with the Viterbi algorithm, and temporally localize an action with the maximum subarray method. Experimental results on the UCF-Sports, J-HMDB and UCF101 action detection datasets show that our approach outperforms the state of the art with a significant margin in both frame-mAP and video-mAP.
Type de document :
Communication dans un congrès
ECCV 2016 - European Conference on Computer Vision, Oct 2016, Amsterdam, Netherlands
Liste complète des métadonnées


https://hal.inria.fr/hal-01349107
Contributeur : Thoth Team <>
Soumis le : mardi 26 juillet 2016 - 17:10:23
Dernière modification le : jeudi 8 décembre 2016 - 13:30:39

Fichiers

PC_ECCV16_TS-R-CNN.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01349107, version 1

Citation

Xiaojiang Peng, Cordelia Schmid. Multi-region two-stream R-CNN for action detection. ECCV 2016 - European Conference on Computer Vision, Oct 2016, Amsterdam, Netherlands. 〈hal-01349107v1〉

Partager

Métriques

Consultations de
la notice

614

Téléchargements du document

1207