Skip to Main content Skip to Navigation

Toward unsupervised human activity and gesture recognition in videos

Farhood Negin 1 
1 STARS - Spatio-Temporal Activity Recognition Systems
CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : The main goal of this thesis is to propose a complete framework for automatic discovery, modeling and recognition of human activities in videos. In order to model and recognize activities in long-term videos, we propose a framework that combines global and local perceptual information from the scene and accordingly constructs hierarchical activity models. In the first variation of the framework, a supervised classifier based on Fisher vector is trained and the predicted semantic labels are embedded in the constructed hierarchical models. In the second variation, to have a completely unsupervised framework, rather than embedding the semantic labels, the trained visual codebooks are stored in the models. Finally, we evaluate the proposed frameworks on two realistic Activities of Daily Living datasets recorded from patients in a hospital environment. Furthermore, to model fine motions of human body, we propose four different gesture recognition frameworks where each framework accepts one or combination of different data modalities as input. We evaluate the developed frameworks in the context of medical diagnostic test namely Praxis. Praxis test is a gesture-based diagnostic test, which has been accepted as a diagnostically indicative of cortical pathologies such as Alzheimer’s disease. We suggest a new challenge in gesture recognition, which is to obtain an objective opinion about correct and incorrect performances of very similar gestures. The experiments show effectiveness of our deep learning based approach in gesture recognition and performance assessment tasks.
Complete list of metadata

Cited literature [287 references]  Display  Hide  Download
Contributor : ABES STAR :  Contact
Submitted on : Tuesday, February 26, 2019 - 4:12:25 PM
Last modification on : Saturday, June 25, 2022 - 11:34:42 PM
Long-term archiving on: : Monday, May 27, 2019 - 2:48:01 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01947341, version 2



Farhood Negin. Toward unsupervised human activity and gesture recognition in videos. Computer Vision and Pattern Recognition [cs.CV]. Université Côte d'Azur, 2018. English. ⟨NNT : 2018AZUR4246⟩. ⟨tel-01947341v2⟩



Record views


Files downloads