Structured Output Prediction and Learning for Deep Monocular 3D Human Pose Estimation

Abstract : In this work we address the problem of estimating 3D human pose from a single RGB image by blending a feed-forward CNN with a graphical model that couples the 3D positions of parts. The CNN populates a volumetric output space that represents the possible positions of 3D human joints, and also regresses the estimated displacements between pairs of parts. These constitute the 'unary' and 'pairwise' terms of the energy of a graphical model that resides in a 3D label space and delivers an optimal 3D pose configuration at its output. The CNN is trained on the 3D human pose dataset 3.6M, the graphical model is trained jointly with the CNN in an end-to-end manner, allowing us to exploit both the discriminative power of CNNs and the top-down information pertaining to human pose. We introduce (a) memory efficient methods for getting accurate voxel estimates for parts by blending quantization with regression (b) employ efficient structured prediction algorithms for 3D pose estimation using branch-and-bound and (c) develop a framework for qualitative and quantitative comparison of competing graphical models. We evaluate our work on the Human 3.6M dataset, demonstrating that exploiting the structure of the human pose in 3D yields systematic gains.
Type de document :
Communication dans un congrès
EMMCVPR 2017 - 11th International Conference on Energy Minimization Methods in Computer Vision and Pattern Recognition, Oct 2017, Venise, Italy. pp.1-14
Liste complète des métadonnées

Littérature citée [37 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01672592
Contributeur : Stefan Kinauer <>
Soumis le : mardi 26 décembre 2017 - 12:43:34
Dernière modification le : jeudi 7 février 2019 - 17:29:11
Document(s) archivé(s) le : mardi 27 mars 2018 - 12:12:03

Fichier

camera ready (1).pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01672592, version 1

Citation

Stefan Kinauer, Riza Güler, Siddhartha Chandra, Iasonas Kokkinos. Structured Output Prediction and Learning for Deep Monocular 3D Human Pose Estimation. EMMCVPR 2017 - 11th International Conference on Energy Minimization Methods in Computer Vision and Pattern Recognition, Oct 2017, Venise, Italy. pp.1-14. 〈hal-01672592〉

Partager

Métriques

Consultations de la notice

228

Téléchargements de fichiers

287