Skip to Main content Skip to Navigation
Conference papers

Learning person-object interactions for action recognition in still images

V. Delaitre 1, * J. Sivic 1 I. Laptev 1 
* Corresponding author
1 WILLOW - Models of visual object recognition and scene understanding
DI-ENS - Département d'informatique - ENS Paris, Inria Paris-Rocquencourt, CNRS - Centre National de la Recherche Scientifique : UMR8548
Abstract : We investigate a discriminatively trained model of person-object interactions for recognizing common human actions in still images. We build on the locally order-less spatial pyramid bag-of-features model, which was shown to perform extremely well on a range of object, scene and human action recognition tasks. We introduce three principal contributions. First, we replace the standard quantized local HOG/SIFT features with stronger discriminatively trained body part and object detectors. Second, we introduce new person-object interaction features based on spatial co-occurrences of individual body parts and objects. Third, we address the combinatorial problem of a large number of possible interaction pairs and propose a discriminative selection procedure using a linear support vector machine (SVM) with a sparsity inducing regularizer. Learning of action-specific body part and object interactions bypasses the difficult problem of estimating the complete human body pose configuration. Benefits of the proposed model are shown on human action recognition in consumer photographs, outperforming the strong bag-of-features baseline.
Document type :
Conference papers
Complete list of metadata

Cited literature [37 references]  Display  Hide  Download
Contributor : Josef Sivic Connect in order to contact the contributor
Submitted on : Monday, December 5, 2011 - 11:47:52 AM
Last modification on : Thursday, March 17, 2022 - 10:08:39 AM
Long-term archiving on: : Friday, November 16, 2012 - 2:21:04 PM


Files produced by the author(s)


  • HAL Id : hal-00648156, version 1



V. Delaitre, J. Sivic, I. Laptev. Learning person-object interactions for action recognition in still images. NIPS 2011 : Twenty-Fifth Annual Conference on Neural Information Processing Systems, NIPS Foundation, Dec 2011, Grenada, Spain. ⟨hal-00648156⟩



Record views


Files downloads