M. Andriluka, S. Roth, and B. Schiele, Pictorial structures revisited: People detection and articulated pose estimation, 2009 IEEE Conference on Computer Vision and Pattern Recognition, p.CVPR, 2009.
DOI : 10.1109/CVPR.2009.5206754

M. Andriluka, S. Roth, and B. Schiele, Monocular 3D pose estimation and tracking by detection, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p.CVPR, 2010.
DOI : 10.1109/CVPR.2010.5540156

O. Barinova, V. Lempitsky, E. Tretyak, and P. Kohli, Geometric Image Parsing in Man-Made Environments, p.ECCV, 2010.
DOI : 10.1007/978-3-642-15552-9_5

L. Bourdev and J. Malik, Poselets: Body part detectors trained using 3D human pose annotations, 2009 IEEE 12th International Conference on Computer Vision, p.ICCV, 2009.
DOI : 10.1109/ICCV.2009.5459303

W. Choi, Y. W. Chao, C. Pantofaru, and S. Savarese, Understanding Indoor Scenes Using 3D Geometric Phrases, 2013 IEEE Conference on Computer Vision and Pattern Recognition, p.CVPR, 2013.
DOI : 10.1109/CVPR.2013.12

J. Coughlan and A. Yuille, The Manhattan world assumption: Regularities in scene statistics which enable bayesian inference, p.NIPS, 2000.

N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), p.CVPR, 2005.
DOI : 10.1109/CVPR.2005.177

URL : https://hal.archives-ouvertes.fr/inria-00548512

D. Pero, L. Bowdish, J. Fried, D. Kermgard, B. Hartley et al., Bayesian geometric modeling of indoor scenes, p.CVPR, 2012.

D. Pero, L. Guan, J. Brau, E. Schlecht, J. Barnard et al., Sampling bedrooms, p.CVPR, 2011.

V. Delaitre, D. Fouhey, I. Laptev, J. Sivic, A. Efros et al., Scene Semantics from Long-Term Observation of People, p.ECCV, 2012.
DOI : 10.1007/978-3-642-33783-3_21

URL : https://hal.archives-ouvertes.fr/hal-01060880

V. Delaitre, J. Sivic, and I. Laptev, Learning person-object interactions for action recognition in still images, p.NIPS, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00648156

C. Desai, D. Ramanan, and C. Fowlkes, Discriminative models for static human-object interactions, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Workshops, 2010.
DOI : 10.1109/CVPRW.2010.5543176

B. Efron, Better Bootstrap Confidence Intervals, Journal of the American Statistical Association, vol.11, issue.397, pp.171-185, 1987.
DOI : 10.1080/01621459.1978.10480051

URL : http://www.dtic.mil/get-tr-doc/pdf?AD=ADA150798

P. Felzenszwalb, D. Mcallester, and D. Ramanan, A discriminatively trained, multiscale, deformable part model, 2008 IEEE Conference on Computer Vision and Pattern Recognition, p.CVPR, 2008.
DOI : 10.1109/CVPR.2008.4587597

A. Flint, D. Murray, and I. Reid, Manhattan scene understanding using monocular, stereo, and 3D features, 2011 International Conference on Computer Vision, p.ICCV, 2011.
DOI : 10.1109/ICCV.2011.6126501

D. F. Fouhey, V. Delaitre, A. Gupta, A. A. Efros, I. Laptev et al., People watching: Human actions as a cue for single-view geometry, p.ECCV, 2012.
URL : https://hal.archives-ouvertes.fr/hal-01060874

D. F. Fouhey, A. Gupta, and M. Hebert, Data-Driven 3D Primitives for Single Image Understanding, 2013 IEEE International Conference on Computer Vision, p.ICCV, 2013.
DOI : 10.1109/ICCV.2013.421

J. Gall, A. Fossati, and L. Van-gool, Functional categorization of objects using real-time markerless motion capture, CVPR 2011, p.CVPR, 2011.
DOI : 10.1109/CVPR.2011.5995582

J. Gibson, The ecological approach to visual perception, 1979.

L. Guan, J. S. Franco, and M. Pollefeys, 3D occlusion inference from silhouette cues, p.CVPR, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00348980

A. Gupta, T. Chen, F. Chen, D. Kimber, and L. Davis, Context and observation driven latent variable model for human pose estimation, 2008 IEEE Conference on Computer Vision and Pattern Recognition, p.CVPR, 2008.
DOI : 10.1109/CVPR.2008.4587511

A. Gupta and L. S. Davis, Objects in Action: An Approach for Combining Action Understanding and Object Perception, 2007 IEEE Conference on Computer Vision and Pattern Recognition, p.CVPR, 2007.
DOI : 10.1109/CVPR.2007.383331

A. Gupta, A. Efros, and M. Hebert, Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics, p.ECCV, 2010.
DOI : 10.1007/978-3-642-15561-1_35

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.170.6991

A. Gupta, S. Satkin, A. Efros, and M. Hebert, From 3D scene geometry to human workspace, CVPR 2011, p.CVPR, 2011.
DOI : 10.1109/CVPR.2011.5995448

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.220.2094

R. I. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, second edn, p.521540518, 2004.

V. Hedau, D. Hoiem, and D. Forsyth, Recovering the spatial layout of cluttered rooms, 2009 IEEE 12th International Conference on Computer Vision, p.ICCV, 2009.
DOI : 10.1109/ICCV.2009.5459411

V. Hedau, D. Hoiem, and D. Forsyth, Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry, p.ECCV, 2010.
DOI : 10.1007/978-3-642-15567-3_17

D. Hoiem, A. Efros, and M. Hebert, Geometric context from a single image, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, p.ICCV, 2005.
DOI : 10.1109/ICCV.2005.107

D. Hoiem, A. Efros, and M. Hebert, Putting objects in perspective, IJCV, 2008.

Y. Jiang and A. Saxena, Hallucinated Humans as the Hidden Context for Labeling 3D Scenes, 2013 IEEE Conference on Computer Vision and Pattern Recognition, p.CVPR, 2013.
DOI : 10.1109/CVPR.2013.385

S. Johnson and M. Everingham, Learning effective human pose estimation from inaccurate annotation, CVPR 2011, p.CVPR, 2011.
DOI : 10.1109/CVPR.2011.5995318

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.638.2045

T. Kanade, Recovery of the three-dimensional shape of an object from a single view, Artificial Intelligence, vol.17, issue.1-3, pp.409-460, 1981.
DOI : 10.1016/0004-3702(81)90031-X

K. Karsch, C. Liu, and S. B. Kang, Depth Extraction from Video Using Non-parametric Sampling, p.ECCV, 2012.
DOI : 10.1007/978-3-642-33715-4_56

H. Kjellstrom, J. Romero, D. Martinez, and D. Kragic, Simultaneous Visual Recognition of Manipulation Actions and Manipulated Objects, p.ECCV, 2008.
DOI : 10.1007/978-3-540-88688-4_25

N. Krahnstoever and P. R. Mendonca, Bayesian autocalibration for surveillance, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, p.CVPR, 2005.
DOI : 10.1109/ICCV.2005.44

D. Lee, A. Gupta, M. Hebert, and T. Kanade, Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces, p.NIPS, 2010.

D. Lee, M. Hebert, and T. Kanade, Geometric reasoning for single image structure recovery, 2009 IEEE Conference on Computer Vision and Pattern Recognition, p.ICCV, 2009.
DOI : 10.1109/CVPR.2009.5206872

D. Park and D. Ramanan, N-best maximal decoders for part models, 2011 International Conference on Computer Vision, p.ICCV, 2011.
DOI : 10.1109/ICCV.2011.6126552

N. Payet and S. Todorovic, Scene shape from texture of objects, CVPR 2011, p.CVPR, 2011.
DOI : 10.1109/CVPR.2011.5995326

A. Prest, C. Schmid, and V. Ferrari, Weakly Supervised Learning of Interactions between Humans and Objects, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.3, 2011.
DOI : 10.1109/TPAMI.2011.158

URL : https://hal.archives-ouvertes.fr/inria-00516477

V. Ramakrishna, T. Kanade, and Y. Sheikh, Tracking Human Pose by Tracking Symmetric Parts, 2013 IEEE Conference on Computer Vision and Pattern Recognition, p.CVPR, 2013.
DOI : 10.1109/CVPR.2013.478

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.681.4180

C. Rother, A new approach to vanishing point detection in architectural environments, Image and Vision Computing, vol.20, issue.9-10, 2002.
DOI : 10.1016/S0262-8856(02)00054-9

D. Rother, K. Patwardhan, and G. Sapiro, What Can Casual Walkers Tell Us About A 3D Scene?, 2007 IEEE 11th International Conference on Computer Vision, p.CVPR, 2007.
DOI : 10.1109/ICCV.2007.4409082

A. Saxena, M. Sun, and A. Y. Ng, Make3D: Learning 3D Scene Structure from a Single Still Image, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, issue.5, 2008.
DOI : 10.1109/TPAMI.2008.132

A. Schodl and I. Essa, Depth layers from occlusions, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, p.CVPR, 2001.
DOI : 10.1109/CVPR.2001.990534

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.15.2835

A. G. Schwing, S. Fidler, M. Pollefeys, and R. Urtasun, Box in the Box: Joint 3D Layout and Object Reasoning from Single Images, 2013 IEEE International Conference on Computer Vision, p.ICCV, 2013.
DOI : 10.1109/ICCV.2013.51

A. G. Schwing and R. Urtasun, Efficient Exact Inference for 3D Indoor Scene Understanding, p.ECCV, 2012.
DOI : 10.1007/978-3-642-33783-3_22

C. J. Taylor, Reconstruction of articulated objects from point correspondences in a single image, p.CVPR, 2000.

M. Turek, A. Hoogs, and R. Collins, Unsupervised Learning of Functional Categories in Video Scenes, p.ECCV, 2010.
DOI : 10.1007/978-3-642-15552-9_48

H. Wang, S. Gould, and D. Koller, Discriminative learning with latent variables for cluttered indoor scene understanding, Communications of the ACM, vol.56, issue.4, p.ECCV, 2010.
DOI : 10.1145/2436256.2436276

J. Xiao, B. Russell, and A. Torralba, Localizing 3D cuboids in single-view images, p.NIPS, 2012.

Y. Yang and D. Ramanan, Articulated pose estimation using flexible mixtures of parts, p.CVPR, 2011.

B. Yao, A. Khosla, and L. Fei-fei, Classifying actions and measuring action similarity by modeling the mutual context of objects and human poses, Proc. ICML, 2011.

S. X. Yu, H. Zhang, and J. Malik, Inferring spatial layout from a single image via depth-ordered grouping, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2008.
DOI : 10.1109/CVPRW.2008.4562977