Scene semantics from long-term observation of people

Vincent Delaitre; David F. Fouhey; Ivan Laptev; Josef Sivic; Abhinav Gupta; Alexei A. Efros

doi:10.1007/978-3-642-33783-3_21

Communication Dans Un Congrès Année : 2012

Scene semantics from long-term observation of people

(1) , (2) , (3) , (3) , (2) , (2)

1
2
3

Vincent Delaitre

Fonction : Auteur
PersonId : 959283

Laboratoire d'informatique de l'école normale supérieure

David F. Fouhey

Fonction : Auteur
PersonId : 959282

Computer Science Department - Carnegie Mellon University

Ivan Laptev

Fonction : Auteur
PersonId : 865349

Models of visual object recognition and scene understanding

Josef Sivic

Fonction : Auteur

Models of visual object recognition and scene understanding

Abhinav Gupta

Fonction : Auteur
PersonId : 959035

Computer Science Department - Carnegie Mellon University

Alexei A. Efros

Fonction : Auteur
PersonId : 959036

Computer Science Department - Carnegie Mellon University

Résumé

Our everyday objects support various tasks and can be used by people for different purposes. While object classification is a widely studied topic in computer vision, recognition of object function, i.e., what people can do with an object and how they do it, is rarely addressed. In this paper we construct a functional object description with the aim to recognize objects by the way people interact with them. We describe scene objects (sofas, tables, chairs) by associated human poses and object appearance. Our model is learned discriminatively from automatically estimated body poses in many realistic scenes. In particular, we make use of time-lapse videos from YouTube providing a rich source of common human-object interactions and minimizing the effort of manual object annotation. We show how the models learned from human observations significantly improve object recognition and enable prediction of characteristic human poses in new scenes. Results are shown on a dataset of more than 400,000 frames obtained from 146 time-lapse videos of challenging and realistic indoor scenes.

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV]

Fichier principal

delaitre_ECCV12.pdf (2.67 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Vincent Delaitre : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01060880

Soumis le : jeudi 4 septembre 2014-14:31:38

Dernière modification le : jeudi 1 février 2024-10:04:56

Archivage à long terme le : vendredi 5 décembre 2014-10:28:29

Dates et versions

hal-01060880 , version 1 (04-09-2014)

Identifiants

HAL Id : hal-01060880 , version 1
DOI : 10.1007/978-3-642-33783-3_21

Citer

Vincent Delaitre, David F. Fouhey, Ivan Laptev, Josef Sivic, Abhinav Gupta, et al.. Scene semantics from long-term observation of people. European Conference on Computer Vision, Oct 2012, Florence, Italy. pp.284-298, ⟨10.1007/978-3-642-33783-3_21⟩. ⟨hal-01060880⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS UNIV-RENNES1 CNRS INRIA IRISA INRIA2 PSL UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

321 Consultations

342 Téléchargements

Scene semantics from long-term observation of people

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager