Scene semantics from long-term observation of people

Vincent Delaitre 1 David Fouhey 2 Ivan Laptev 3 Josef Sivic 3 Abhinav Gupta 2 Alexei A. Efros 2
3 WILLOW - Models of visual object recognition and scene understanding
CNRS - Centre National de la Recherche Scientifique : UMR8548, Inria Paris-Rocquencourt, DI-ENS - Département d'informatique de l'École normale supérieure
Abstract : Our everyday objects support various tasks and can be used by people for different purposes. While object classification is a widely studied topic in computer vision, recognition of object function, i.e., what people can do with an object and how they do it, is rarely addressed. In this paper we construct a functional object description with the aim to recognize objects by the way people interact with them. We describe scene objects (sofas, tables, chairs) by associated human poses and object appearance. Our model is learned discriminatively from automatically estimated body poses in many realistic scenes. In particular, we make use of time-lapse videos from YouTube providing a rich source of common human-object interactions and minimizing the effort of manual object annotation. We show how the models learned from human observations significantly improve object recognition and enable prediction of characteristic human poses in new scenes. Results are shown on a dataset of more than 400,000 frames obtained from 146 time-lapse videos of challenging and realistic indoor scenes.
Type de document :
Communication dans un congrès
Andrew Fitzgibbon and Svetlana Lazebnik and Pietro Perona and Yoichi Sato and Cordelia Schmid. European Conference on Computer Vision, Oct 2012, Florence, Italy. Springer, 7577, pp.284-298, 2012, LNCS - Lecture Notes in Computer Science. 〈10.1007/978-3-642-33783-3_21〉
Liste complète des métadonnées

Littérature citée [32 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01060880
Contributeur : Vincent Delaitre <>
Soumis le : jeudi 4 septembre 2014 - 14:31:38
Dernière modification le : vendredi 25 mai 2018 - 12:02:06
Document(s) archivé(s) le : vendredi 5 décembre 2014 - 10:28:29

Fichier

delaitre_ECCV12.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Vincent Delaitre, David Fouhey, Ivan Laptev, Josef Sivic, Abhinav Gupta, et al.. Scene semantics from long-term observation of people. Andrew Fitzgibbon and Svetlana Lazebnik and Pietro Perona and Yoichi Sato and Cordelia Schmid. European Conference on Computer Vision, Oct 2012, Florence, Italy. Springer, 7577, pp.284-298, 2012, LNCS - Lecture Notes in Computer Science. 〈10.1007/978-3-642-33783-3_21〉. 〈hal-01060880〉

Partager

Métriques

Consultations de la notice

392

Téléchargements de fichiers

241