Skip to Main content Skip to Navigation
Theses

Learning visual models for person detection and action prediction

Tuan-Hung Vu 1, 2
Résumé : In this thesis, we address person detection and action prediction in visual data. We develop models that learn representations for visual data and the structure in the output space while making use of contextual cues and temporal consistency. We also propose a predictive model to anticipate person’s attention in given static scenes. In the first part of the thesis, we explores the strong association between scene categories and actions. Based on that understanding, we formulate a new task of predicting human actions in static scenes. To train and evaluate the proposed model, we collect a new dataset of scene-action associations, named SUN Action dataset. The success of this task enables potential applications such as affordance geo-localization. The second part of the thesis is focused on person and generic object detection in videos. First, we construct contextual models to enhance person detection in individual frames. We train and evaluate our method on our new HollywoodHeads dataset with annotated human heads in movies. Our models consistently improve detection performance over baseline detectors. Second, we introduce a novel convolutional neural network architecture operating on short clips of frames to leverage temporal consistency and to learn spatio-temporal representations. By empirical experiments, we demonstrate the benefit of our spatio-temporal representations for object detection in videos. Last, we learn video representations that incorporate multiscale information on coarse time scales and design practical frameworks that achieve accuracy, efficiency and predictive power. Compared to per-frame features, our video representations show best detection improvement on frames degraded by fast motions.
Complete list of metadata

https://hal.inria.fr/tel-01861455
Contributor : Tuan-Hung Vu <>
Submitted on : Friday, August 24, 2018 - 2:44:24 PM
Last modification on : Tuesday, May 4, 2021 - 2:06:03 PM
Long-term archiving on: : Sunday, November 25, 2018 - 2:08:31 PM

File

thesis_Tuan-Hung.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-01861455, version 1

Collections

Citation

Tuan-Hung Vu. Learning visual models for person detection and action prediction. Computer Vision and Pattern Recognition [cs.CV]. Ecole Normale Superieure de Paris - ENS Paris, 2018. English. ⟨tel-01861455⟩

Share

Metrics

Record views

1306

Files downloads

1006