Explicit modeling of human-object interactions in realistic videos

Alessandro Prest; Vittorio Ferrari; Cordelia Schmid

doi:10.1109/TPAMI.2012.175

Article Dans Une Revue IEEE Transactions on Pattern Analysis and Machine Intelligence Année : 2013

Explicit modeling of human-object interactions in realistic videos

(1, 2) , (2) , (1)

1
2

Alessandro Prest

Fonction : Auteur
PersonId : 879018

Learning and recognition in vision

Eidgenössische Technische Hochschule - Swiss Federal Institute of Technology [Zürich]

Vittorio Ferrari

Fonction : Auteur
PersonId : 852592

Eidgenössische Technische Hochschule - Swiss Federal Institute of Technology [Zürich]

Cordelia Schmid

Fonction : Auteur
PersonId : 831154

Learning and recognition in vision

Résumé

We introduce an approach for learning human actions as interactions between persons and objects in realistic videos. Previous work typically represents actions with low-level features such as image gradients or optical flow. In contrast, we explicitly localize in space and track over time both the object and the person, and represent an action as the trajectory of the object wrt to the person position. Our approach relies on state-of-the-art techniques for human detection [32], object detection [10], and tracking [39]. We show that this results in human and object tracks of sufficient quality to model and localize human-object interactions in realistic videos. Our human-object interaction features capture the relative trajectory of the object wrt the human. Experimental results on the Coffee & Cigarettes dataset [25], the video dataset of [19] and the Rochester Daily Activities dataset [29] show that (i) our explicit human-object model is an informative cue for action recognition; (ii) it is complementary to traditional low-level descriptors such as 3D-HOG [23] extracted over human tracks. When show that combining our human-object interaction features with 3D-HOG improves over their individual performance as well as over the state-of-the-art [23], [29].

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV]

Fichier principal

CC.pdf (7.24 Mo)

eyecatcher.jpg (1.05 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Format : Autre

Alessandro Prest : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00720847

Soumis le : mercredi 25 juillet 2012-23:36:27

Dernière modification le : jeudi 4 avril 2024-20:59:04

Archivage à long terme le : vendredi 16 décembre 2016-03:13:02

Dates et versions

hal-00720847 , version 1 (25-07-2012)

Identifiants

HAL Id : hal-00720847 , version 1
DOI : 10.1109/TPAMI.2012.175

Citer

Alessandro Prest, Vittorio Ferrari, Cordelia Schmid. Explicit modeling of human-object interactions in realistic videos. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35 (4), pp.835-848. ⟨10.1109/TPAMI.2012.175⟩. ⟨hal-00720847⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UGA CNRS INRIA IRISA LJK LJK_GI LJK_GI_LEAR INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

430 Consultations

626 Téléchargements

Explicit modeling of human-object interactions in realistic videos

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager