Learning to parse pictures of people

Rémi Ronfard 1 Cordelia Schmid 1 William Triggs 1
1 MOVI - Modeling, localization, recognition and interpretation in computer vision
GRAVIR - IMAG - Graphisme, Vision et Robotique, Inria Grenoble - Rhône-Alpes, CNRS - Centre National de la Recherche Scientifique : FR71
Abstract : Detecting people in images is a key problem for video indexing, browsing and retrieval. The main difficulties are the large appearance variations caused by action, clothing, illumination, viewpoint and scale. Our goal is to find people in static video frames using learned models of both the appearance of body parts (head, limbs, hands), and of the geometry of their assemblies. We build on Forsyth & Fleck's general ‘body plan' methodology and Felzenszwalb & Huttenlocher's dynamic programming approach for efficiently assembling candidate parts into ‘pictorial structures'. However we replace the rather simple part detectors used in these works with dedicated detectors learned for each body part using Support Vector Machines (SVMs) or Relevance Vector Machines (RVMs). We are not aware of any previous work using SVMs to learn articulated body plans, however they have been used to detect both whole pedestrians and combinations of rigidly positioned subimages (typically, upper body, arms, and legs) in street scenes, under a wide range of illumination, pose and clothing variations. RVMs are SVM-like classifiers that offer a well-founded probabilistic interpretation and improved sparsity for reduced computation.We demonstrate their benefits experimentally in a series of results showing great promise for learning detectors in more general situations.
Type de document :
Communication dans un congrès
Springer. European Conference on Computer Vision, 2002, Copenhague, Denmark. 4, pp.700--714, 2002
Liste complète des métadonnées

Littérature citée [25 références]  Voir  Masquer  Télécharger

Contributeur : Rémi Ronfard <>
Soumis le : jeudi 9 décembre 2010 - 16:05:50
Dernière modification le : mercredi 11 juillet 2018 - 01:15:57
Document(s) archivé(s) le : jeudi 10 mars 2011 - 13:53:50


Fichiers produits par l'(les) auteur(s)


  • HAL Id : inria-00545109, version 1




Rémi Ronfard, Cordelia Schmid, William Triggs. Learning to parse pictures of people. Springer. European Conference on Computer Vision, 2002, Copenhague, Denmark. 4, pp.700--714, 2002. 〈inria-00545109〉



Consultations de la notice


Téléchargements de fichiers