Explicit modeling of human-object interactions in realistic videos

Alessandro Prest; Vittorio Ferrari; Cordelia Schmid

Reports (Technical Report) Year : 2011

Explicit modeling of human-object interactions in realistic videos

(1, 2) , (2) , (1)

1
2

Alessandro Prest

Function : Author
PersonId : 879018

Learning and recognition in vision

Eidgenössische Technische Hochschule - Swiss Federal Institute of Technology [Zürich]

Vittorio Ferrari

Function : Author
PersonId : 852592

Eidgenössische Technische Hochschule - Swiss Federal Institute of Technology [Zürich]

Cordelia Schmid

Function : Author
PersonId : 831154

Learning and recognition in vision

Abstract

We introduce an approach for learning human actions as interactions between persons and objects in realistic videos. Previous works typically represent actions with low-level features such as image gradients or optical flow. In contrast, we explicitly localize in space and track over time both the object and the person, and represent an action as the trajectory of the object wrt to the person position. Our approach relies on state-of-the-art approaches for human [32] and object detection [10] as well as tracking [39]. We show that this results in human and object tracks of sufficient quality to model and localize human-object interactions in realistic videos. Our human-object interaction features capture relative trajectory of the object wrt the human. Experimental results on the Coffee & Cigarettes [25], the video dataset of [19] and the Rochester Daily Activities dataset [29] show that (i) our explicit human-object model is an informative cue for action recognition; (ii) it is complementary to traditional low-level descriptors such as 3D-HOG extracted over human tracks. When combining our human-object interaction features with 3D-HOG features [23], we show to improve over their separate performance as well as over the state of the art.

Domains

Computer Vision and Pattern Recognition [cs.CV]

Fichier principal

CC.pdf (7.28 Mo)

Origin : Files produced by the author(s)

Alessandro Prest : Connect in order to contact the contributor

https://inria.hal.science/inria-00626929

Submitted on : Thursday, May 10, 2012-12:16:04 PM

Last modification on : Thursday, April 4, 2024-6:17:53 PM

Long-term archiving on: Thursday, December 15, 2016-5:39:53 AM

Dates and versions

inria-00626929 , version 1 (27-09-2011)

inria-00626929 , version 2 (28-09-2011)

inria-00626929 , version 3 (28-09-2011)

inria-00626929 , version 4 (10-05-2012)

Identifiers

HAL Id : inria-00626929 , version 4

Cite

Alessandro Prest, Vittorio Ferrari, Cordelia Schmid. Explicit modeling of human-object interactions in realistic videos. [Technical Report] RT-0411, 2011. ⟨inria-00626929v4⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UGA CNRS INRIA IRISA LJK LJK_GI LJK_GI_LEAR INRIA2 LARA UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

1080 View

830 Download

Explicit modeling of human-object interactions in realistic videos

Abstract

Domains

Dates and versions

Identifiers

Cite

Export

Collections

Share