inria-00626929, version 3
Explicit modeling of human-object interactions in realistic videos
N° RT-0411 (2011)
Abstract: We introduce an approach for learning human actions as interactions between persons and objects in realistic videos. Previous works typically represent actions with low-level features such as image gradients or optical flow. In contrast, we explicitly localize in space and track over time both the object and the person, and represent an action as the trajectory of the object wrt to the person position. Our approach relies on state-of-the-art approaches for human [28] and object detection [11] as well as tracking [3]. We show that this results in human and object tracks of sufficient quality to model and localize human-object interactions in realistic videos. Our human-object interaction features capture relative trajectory of the object wrt the human. Experimental results on the Coffee & Cigarettes [22] and the video dataset of [17] show that (i) our explicit human-object model is an informative cue for action recognition; (ii) it is complementary to traditional low-level descriptors such as 3D-HOG extracted over human tracks. When combining our human-object interaction features with 3D-HOG features [20], we show to improve over their separate performance as well as over the state of the art.
- a – INRIA
- 1:
- CNRS : UMR5527 – INRIA – Laboratoire Jean Kuntzmann – Université Joseph Fourier - Grenoble I – Institut National Polytechnique de Grenoble (INPG)
- 2:
- ETH Zurich
- Domain : Computer Science/Computer Vision and Pattern Recognition
- Internal note : RT-0411
- Available versions : v1 (2011-09-27) v2 (2011-09-28) v3 (2011-09-28) v4 (2012-05-10)
- inria-00626929, version 3
- http://hal.inria.fr/inria-00626929
- oai:hal.inria.fr:inria-00626929
- From:
- Submitted on: Wednesday, 28 September 2011 15:28:32
- Updated on: Monday, 10 October 2011 16:12:00





Associated documents
Export