Skip to Main content Skip to Navigation
Journal articles

Explicit modeling of human-object interactions in realistic videos

Alessandro Prest 1, 2 Vittorio Ferrari 2 Cordelia Schmid 1 
1 LEAR - Learning and recognition in vision
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, Grenoble INP - Institut polytechnique de Grenoble - Grenoble Institute of Technology
Abstract : We introduce an approach for learning human actions as interactions between persons and objects in realistic videos. Previous work typically represents actions with low-level features such as image gradients or optical flow. In contrast, we explicitly localize in space and track over time both the object and the person, and represent an action as the trajectory of the object wrt to the person position. Our approach relies on state-of-the-art techniques for human detection [32], object detection [10], and tracking [39]. We show that this results in human and object tracks of sufficient quality to model and localize human-object interactions in realistic videos. Our human-object interaction features capture the relative trajectory of the object wrt the human. Experimental results on the Coffee & Cigarettes dataset [25], the video dataset of [19] and the Rochester Daily Activities dataset [29] show that (i) our explicit human-object model is an informative cue for action recognition; (ii) it is complementary to traditional low-level descriptors such as 3D-HOG [23] extracted over human tracks. When show that combining our human-object interaction features with 3D-HOG improves over their individual performance as well as over the state-of-the-art [23], [29].
Document type :
Journal articles
Complete list of metadata

Cited literature [13 references]  Display  Hide  Download
Contributor : Alessandro Prest Connect in order to contact the contributor
Submitted on : Wednesday, July 25, 2012 - 11:36:27 PM
Last modification on : Thursday, January 20, 2022 - 5:28:04 PM
Long-term archiving on: : Friday, December 16, 2016 - 3:13:02 AM


Files produced by the author(s)




Alessandro Prest, Vittorio Ferrari, Cordelia Schmid. Explicit modeling of human-object interactions in realistic videos. IEEE Transactions on Pattern Analysis and Machine Intelligence, Institute of Electrical and Electronics Engineers, 2013, 35 (4), pp.835-848. ⟨10.1109/TPAMI.2012.175⟩. ⟨hal-00720847⟩



Record views


Files downloads