O. Boiman and M. Irani, Similarity by composition, NIPS, 2006.

O. Boiman, E. Shechtman, and M. Irani, In defense of nearestneighbor based image classification, CVPR, 2008.

L. Bourdev, S. Maji, and J. Malik, Describing people: A poselet-based approach to attribute classification, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126413

L. Bourdev and J. Malik, Poselets: Body part detectors trained using 3D human pose annotations, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459303

G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray, Visual categorization with bags of keypoints, Intl. Workshop on Stat. Learning in Comp. Vision, 2004.

N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), p.3, 2005.
DOI : 10.1109/CVPR.2005.177

URL : https://hal.archives-ouvertes.fr/inria-00548512

V. Delaitre, I. Laptev, and J. Sivic, Recognizing human actions in still images: a study of bag-of-features and part-based representations, Procedings of the British Machine Vision Conference 2010, 2006.
DOI : 10.5244/C.24.97

URL : https://hal.archives-ouvertes.fr/hal-01060885

V. Delaitre, J. Sivic, and I. Laptev, Learning person-object interactions for action recognition in still images, NIPS, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00648156

C. Desai and D. Ramanan, Detecting Actions, Poses, and Objects with Relational Phraselets, ECCV, 2012, p.3
DOI : 10.1007/978-3-642-33765-9_12

C. Desai, D. Ramanan, and C. Fowlkes, Discriminative models for static human-object interactions, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Workshops, 2010.
DOI : 10.1109/CVPRW.2010.5543176

R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin, LI- BLINEAR: A library for large linear classification, JMLR, vol.9, issue.5, pp.1871-1874, 2008.

P. Felzenszwalb, R. Girshick, D. Mcallester, and D. Ramanan, Object Detection with Discriminatively Trained Part-Based Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.9, pp.1627-1645, 2010.
DOI : 10.1109/TPAMI.2009.167

R. Fergus, P. Perona, and A. Zisserman, Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition, International Journal of Computer Vision, vol.20, issue.1, pp.273-303, 2002.
DOI : 10.1007/s11263-006-8707-x

A. Gupta, A. Kembhavi, and L. S. Davis, Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, issue.10, pp.1775-1789, 2009.
DOI : 10.1109/TPAMI.2009.83

S. Lazebnik, C. Schmid, and J. Ponce, Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), 2006.
DOI : 10.1109/CVPR.2006.68

URL : https://hal.archives-ouvertes.fr/inria-00548585

B. Leibe, A. Leonardis, and B. Schiele, Robust Object Detection with Interleaved Categorization and Segmentation, International Journal of Computer Vision, vol.73, issue.2, pp.259-289, 2008.
DOI : 10.1007/s11263-007-0095-3

L. Li, H. Su, E. Xing, and L. Fei-fei, Object bank: A high-level image representation for scene classification and semantic feature sparsification, NIPS, p.7, 2010.

D. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, vol.60, issue.2, pp.91-110, 2004.
DOI : 10.1023/B:VISI.0000029664.99615.94

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.4931

S. Maji, L. Bourdev, and J. Malik, Action recognition from a distributed representation of pose and appearance, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995631

T. Malisiewicz, A. Gupta, and A. Efros, Ensemble of exemplar-SVMs for object detection and beyond, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126229

M. Pandey and S. Lazebnik, Scene recognition and weakly supervised object localization with deformable part-based models, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126383

F. Perronnin, Z. Akata, Z. Harchaoui, and C. Schmid, Towards good practice in large-scale learning for image classification, 2012 IEEE Conference on Computer Vision and Pattern Recognition, p.5
DOI : 10.1109/CVPR.2012.6248090

URL : https://hal.archives-ouvertes.fr/hal-00690014

A. Prest, C. Schmid, and V. Ferrari, Weakly Supervised Learning of Interactions between Humans and Objects, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.3, 2011.
DOI : 10.1109/TPAMI.2011.158

URL : https://hal.archives-ouvertes.fr/inria-00516477

G. Sharma and F. Jurie, Learning discriminative spatial representation for image classification, Procedings of the British Machine Vision Conference 2011, 2006.
DOI : 10.5244/C.25.6

URL : https://hal.archives-ouvertes.fr/hal-00722820

G. Sharma, F. Jurie, and C. Schmid, Discriminative spatial saliency for image classification, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6248093

URL : https://hal.archives-ouvertes.fr/hal-00714311

J. Sivic and A. Zisserman, Video Google: a text retrieval approach to object matching in videos, Proceedings Ninth IEEE International Conference on Computer Vision, 2003.
DOI : 10.1109/ICCV.2003.1238663

A. Vedaldi and B. Fulkerson, Vlfeat, Proceedings of the international conference on Multimedia, MM '10, 2008.
DOI : 10.1145/1873951.1874249

A. Vedaldi and A. Zisserman, Efficient additive kernels using explicit feature maps, CVPR, 2010.
DOI : 10.1109/cvpr.2010.5539949

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.167.7024

J. Wang, J. Yang, K. Yu, F. Lv, T. Huang et al., Localityconstrained linear coding for image classification, CVPR, 2010.

S. Yan, X. Xu, D. Xu, S. Lin, and X. Li, Beyond Spatial Pyramids: A New Feature Extraction Framework with Dense Spatial Sampling for Image Classification, ECCV, pp.473-487, 2012.
DOI : 10.1007/978-3-642-33765-9_34

W. Yang, Y. Wang, and G. Mori, Recognizing human actions from still images with latent poses, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5539879

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.167.3890

Y. Yang and D. Ramanan, Articulated pose estimation with flexible mixtures-of-parts, CVPR 2011, 2003.
DOI : 10.1109/CVPR.2011.5995741

B. Yao and L. Fei-fei, Grouplet: A structured image representation for recognizing human and object interactions, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540234

B. Yao and L. Fei-fei, Modeling mutual context of object and human pose in human-object interaction activities, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540235

B. Yao, X. Jiang, A. Khosla, A. L. Lin, L. J. Guibas et al., Human action recognition by learning bases of action attributes and parts, 2011 International Conference on Computer Vision, p.7, 2011.
DOI : 10.1109/ICCV.2011.6126386

B. Yao, A. Khosla, and L. Fei-fei, Combining randomization and discrimination for fine-grained image categorization, CVPR 2011, 2005.
DOI : 10.1109/CVPR.2011.5995368

P. Zhu, L. Zhang, Q. Hu, and S. Shiu, Multi-scale Patch Based Collaborative Representation for Face Recognition with Margin Distribution Optimization, ECCV, 2012.
DOI : 10.1007/978-3-642-33718-5_59

X. Zhu and D. Ramanan, Face detection, pose estimation, and landmark localization in the wild, CVPR. IEEE, p.3

X. Zhu, C. Vondrick, D. Ramanan, and C. Fowlkes, Do We Need More Training Data or Better Models for Object Detection?, Procedings of the British Machine Vision Conference 2012, p.7
DOI : 10.5244/C.26.80

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.259.7748