Modeling Spatio-Temporal Human Track Structure for Action Localization

Guilhem Chéron 1, 2 Anton Osokin 1, 3 Ivan Laptev 1 Cordelia Schmid 2
1 WILLOW - Models of visual object recognition and scene understanding
Inria de Paris, DI-ENS - Département d'informatique de l'École normale supérieure
2 Thoth - Apprentissage de modèles à partir de données massives
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann
Abstract : This paper addresses spatio-temporal localization of human actions in video. In order to localize actions in time, we propose a recurrent localization network (RecLNet) designed to model the temporal structure of actions on the level of person tracks. Our model is trained to simultaneously recognize and localize action classes in time and is based on two layer gated recurrent units (GRU) applied separately to two streams, i.e. appearance and optical flow streams. When used together with state-of-the-art person detection and tracking, our model is shown to improve substantially spatio-temporal action localization in videos. The gain is shown to be mainly due to improved temporal localization. We evaluate our method on two recent datasets for spatio-temporal action localization, UCF101-24 and DALY, demonstrating a significant improvement of the state of the art.
Type de document :
Pré-publication, Document de travail
2019
Liste complète des métadonnées

https://hal.inria.fr/hal-01979583
Contributeur : Guilhem Chéron <>
Soumis le : dimanche 13 janvier 2019 - 14:36:01
Dernière modification le : mercredi 30 janvier 2019 - 11:07:44

Lien texte intégral

Identifiants

  • HAL Id : hal-01979583, version 1
  • ARXIV : 1806.11008

Collections

Citation

Guilhem Chéron, Anton Osokin, Ivan Laptev, Cordelia Schmid. Modeling Spatio-Temporal Human Track Structure for Action Localization. 2019. 〈hal-01979583〉

Partager

Métriques

Consultations de la notice

71