A. Gupta, A. Kembhavi, and L. Davis, Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, issue.10, 2009.
DOI : 10.1109/TPAMI.2009.83

M. Everingham, L. Van-gool, C. K. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, p.html, 2010.
DOI : 10.1007/s11263-009-0275-4

C. Schuldt, I. Laptev, and B. Caputo, Recognizing human actions: a local SVM approach, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., 2004.
DOI : 10.1109/ICPR.2004.1334462

I. Laptev and P. Perez, Retrieving actions in movies, 2007 IEEE 11th International Conference on Computer Vision, 2007.
DOI : 10.1109/ICCV.2007.4409105

K. Mikolajczyk and H. Uemura, Action recognition with motionappearance vocabulary forest, In: CVPR, 2008.

J. Sullivan and S. Carlsson, Recognizing and Tracking Human Action, In: ECCV, 2002.
DOI : 10.1007/3-540-47969-4_42

N. Ikizler-cinbis, G. Cinbis, and S. Sclaroff, Learning actions from the Web, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459368

P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, Behavior Recognition via Sparse Spatio-Temporal Features, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005.
DOI : 10.1109/VSPETS.2005.1570899

I. Laptev, M. Marsza?ek, C. Schmid, and B. Rozenfeld, Learning realistic human actions from movies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587756

URL : https://hal.archives-ouvertes.fr/inria-00548659

G. Willems, J. H. Becker, T. Tuytelaars, and L. Van-gool, Exemplarbased action recognition in video, 2009.

C. Thurau and V. Hlavac, Pose primitive based human action recognition in videos or still images, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587721

B. Yao and L. Fei-fei, Modeling mutual context of object and human pose in human-object interaction activities, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540235

B. Yao and L. Fei-fei, Grouplet: A structured image representation for recognizing human and object interactions, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540234

. Fig, Example results on the PASCAL Action 2010 test set [2]. Each column shows two images from the validation set for the same class. From left to right: 'playing instrument', 'reading', 'taking photo', 'riding horse' and 'walking

C. Desai, D. Ramanan, and C. F. , Discriminative models for static human-object interactions, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Workshops, 2010.
DOI : 10.1109/CVPRW.2010.5543176

C. Desai, D. Ramanan, and C. Fowlkes, Discriminative models for multi-class object layout, In: ICCV, 2007.

A. Gupta, T. Chen, F. Chen, D. Kimber, and L. Davis, Context and observation driven latent variable model for human pose estimation, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587511

N. Ikizler-cinbis and S. Sclaroff, Object, Scene and Actions: Combining Multiple Features for Human Action Recognition, 2010.
DOI : 10.1007/978-3-642-15549-9_36

R. Fergus, P. Perona, and A. Zisserman, Object class recognition by unsupervised scale-invariant learning, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., 2003.
DOI : 10.1109/CVPR.2003.1211479

J. Winn, A. Criminisi, and T. Minka, Object categorization by learned universal visual dictionary, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, 2005.
DOI : 10.1109/ICCV.2005.171

T. Deselaers, B. Alexe, and V. Ferrari, Localizing Objects While Learning Their Appearance, In: ECCV, 2010.
DOI : 10.1007/978-3-642-15561-1_33

P. F. Felzenszwalb, R. B. Girshick, D. Mcallester, and D. Ramanan, Object Detection with Discriminatively Trained Part-Based Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.9, 2009.
DOI : 10.1109/TPAMI.2009.167

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.153.2745

M. Everingham, L. Van-gool, C. K. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, 2007.
DOI : 10.1007/s11263-009-0275-4

V. Ferrari, M. Marin-jimenez, and A. Zisserman, Progressive search space reduction for human pose estimation, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587468

Y. Rodriguez, Face Detection and Verification using Local Binary Patterns, 2006.

P. Viola and M. Jones, Rapid object detection using a boosted cascade of simple features, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, 2001.
DOI : 10.1109/CVPR.2001.990517

G. Heusch, Y. Rodriguez, and S. Marcel, Local Binary Patterns as an Image Preprocessing for Face Authentication, 7th International Conference on Automatic Face and Gesture Recognition (FGR06), 2006.
DOI : 10.1109/FGR.2006.72

D. Comaniciu, V. Ramesh, and P. Meer, The variable bandwidth mean shift and data-driven scale selection, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, 2001.
DOI : 10.1109/ICCV.2001.937550

M. Eichner and V. Ferrari, Better appearance models for pictorial structures, Procedings of the British Machine Vision Conference 2009, 2009.
DOI : 10.5244/C.23.3

R. Fergus and P. Perona, Caltech object category datasets, 2003.

M. Everingham, L. Van-gool, C. K. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, 2008.
DOI : 10.1007/s11263-009-0275-4

B. Alexe, T. Deselaers, and V. Ferrari, What is an object?, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540226

V. Kolmogorov, Convergent Tree-Reweighted Message Passing for Energy Minimization, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.28, issue.10, pp.1568-1583, 2006.
DOI : 10.1109/TPAMI.2006.200

H. Bay, A. Ess, T. Tuytelaars, and L. Van-gool, SURF: Speeded up robust features, CVIU, vol.110, pp.346-359, 2008.

Z. Botev, Nonparametric density estimation via diffusion mixing. The University of Queensland, Postgraduate Series, 2007.

J. Zhang, M. Marszalek, S. Lazebnik, C. , and S. , Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study, International Journal of Computer Vision, vol.36, issue.1, 2007.
DOI : 10.1007/s11263-006-9794-4

URL : https://hal.archives-ouvertes.fr/inria-00548574

A. Oliva and A. Torralba, Modeling the shape of the scene: a holistic representation of the spatial envelope, International Journal of Computer Vision, vol.42, issue.3, pp.145-175, 2001.
DOI : 10.1023/A:1011139631724

L. J. Li and L. Fei-fei, What, where and who? Classifying events by scene and object recognition, 2007 IEEE 11th International Conference on Computer Vision, 2007.
DOI : 10.1109/ICCV.2007.4408872

P. Gehler and S. Nowozin, On feature combination for multiclass object classification, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459169

M. Grubinger, P. D. Clough, H. Uller, and T. Deselaers, The IAPR benchmark: A new evaluation resource for visual information systems, In: LREC, 2006.

R. Johansson and P. Nugues, Dependency-based syntactic-semantic analysis with PropBank and NomBank, Proceedings of the Twelfth Conference on Computational Natural Language Learning, CoNLL '08, 2008.
DOI : 10.3115/1596324.1596355