Human Activity Recognition with Pose-driven Attention to RGB

Fabien Baradel 1 Christian Wolf 1, 2 Julien Mille 3
1 imagine - Extraction de Caractéristiques et Identification
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
2 CHROMA - Robots coopératifs et adaptés à la présence humaine en environnements dynamiques
Inria Grenoble - Rhône-Alpes, CITI - CITI Centre of Innovation in Telecommunications and Integration of services
Abstract : We address human action recognition from multi-modal video data involving articulated pose and RGB frames and propose a two-stream approach. The pose stream is processed with a convolutional model taking as input a 3D tensor holding data from a sub-sequence. A specific joint ordering, which respects the topology of the human body, ensures that different convolutional layers correspond to meaningful levels of abstraction. The raw RGB stream is handled by a spatio-temporal soft-attention mechanism conditioned on features from the pose network. An LSTM network receives input from a set of image locations at each instant. A trainable glimpse sensor extracts features on a set of pre-defined locations specified by the pose stream, namely the 4 hands of the two people involved in the activity. Appearance features give important cues on hand motion and on objects held in each hand. We show that it is of high interest to shift the attention to different hands at different time steps depending on the activity itself. Finally a temporal attention mechanism learns how to fuse LSTM features over time. State-of-the-art results are achieved on the largest dataset for human activity recognition, namely NTU-RGB+D.
Type de document :
Communication dans un congrès
BMVC 2018 - 29th British Machine Vision Conference, Sep 2018, Newcastle, United Kingdom. pp.1-14
Liste complète des métadonnées

Littérature citée [46 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01828083
Contributeur : Christian Wolf <>
Soumis le : mardi 24 juillet 2018 - 13:57:26
Dernière modification le : lundi 24 septembre 2018 - 09:32:13

Fichier

bmvc_review.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01828083, version 1

Citation

Fabien Baradel, Christian Wolf, Julien Mille. Human Activity Recognition with Pose-driven Attention to RGB. BMVC 2018 - 29th British Machine Vision Conference, Sep 2018, Newcastle, United Kingdom. pp.1-14. 〈hal-01828083〉

Partager

Métriques

Consultations de la notice

252

Téléchargements de fichiers

68