J. Aggarwal and M. Ryoo, Human activity analysis, ACM Computing Surveys, vol.43, issue.3, pp.1-1643, 2011.
DOI : 10.1145/1922649.1922653

R. Arandjelovic and A. Zisserman, Three things everyone should know to improve object retrieval, 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.2911-2918, 2012.
DOI : 10.1109/CVPR.2012.6248018

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.370.7498

H. Bay, T. Tuytelaars, and L. V. , SURF: Speeded up robust features, ECCV, 2006.
DOI : 10.1007/11744023_32

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.679.3046

W. Brendel and S. Todorovic, Learning spatiotemporal graphs of human activities, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126316

K. Chatfield, V. Lempitsky, A. Vedaldi, and A. Zisserman, The devil is in the details: an evaluation of recent feature encoding methods, Procedings of the British Machine Vision Conference 2011, 2011.
DOI : 10.5244/C.25.76

N. Dalal, B. Triggs, and C. Schmid, Human Detection Using Oriented Histograms of Flow and Appearance, ECCV, 2006.
DOI : 10.1023/A:1008162616689

URL : https://hal.archives-ouvertes.fr/inria-00548587

P. Dollár, V. Rabaud, G. Cottrell, and S. Belongie, Behavior Recognition via Sparse Spatio-Temporal Features, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005.
DOI : 10.1109/VSPETS.2005.1570899

G. Farnebäck, Two-Frame Motion Estimation Based on Polynomial Expansion, SCIA, 2003.
DOI : 10.1007/3-540-45103-X_50

P. F. Felzenszwalb, R. B. Girshick, D. Mcallester, and D. Ramanan, Object Detection with Discriminatively Trained Part-Based Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.9, pp.1627-1645, 2010.
DOI : 10.1109/TPAMI.2009.167

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.153.2745

V. Ferrari, M. Marin-jimenez, and A. Zisserman, Progressive search space reduction for human pose estimation, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587468

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.321.2867

M. A. Fischler and R. C. Bolles, Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography, Communications of the ACM, vol.24, issue.6, pp.381-395, 1981.
DOI : 10.1145/358669.358692

A. Gaidon, Z. Harchaoui, and C. Schmid, Recognizing activities with cluster-trees of tracklets, Procedings of the British Machine Vision Conference 2012, 2012.
DOI : 10.5244/C.26.30

URL : https://hal.archives-ouvertes.fr/hal-00722955

S. Gauglitz, T. Höllerer, and M. Turk, Evaluation of Interest Point Detectors and Feature Descriptors for Visual Tracking, International Journal of Computer Vision, vol.31, issue.2, pp.335-360, 2011.
DOI : 10.1007/s11263-011-0431-5

M. Jain, H. Jégou, and P. Bouthemy, Better Exploiting Motion for Better Action Recognition, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.330

URL : https://hal.archives-ouvertes.fr/hal-00813014

Y. Jiang, Q. Dai, X. Xue, W. Liu, and C. Ngo, Trajectorybased modeling of human actions with motion reference points, ECCV, 2012.

A. Kläser, M. Marsza?ek, and C. Schmid, A Spatio-Temporal Descriptor Based on 3D-Gradients, Procedings of the British Machine Vision Conference 2008, 2008.
DOI : 10.5244/C.22.99

O. Kliper-gross, Y. Gurovich, T. Hassner, and L. Wolf, Motion Interchange Patterns for Action Recognition in Unconstrained Videos, ECCV, 2012.
DOI : 10.1007/978-3-642-33783-3_19

H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, HMDB: A large video database for human motion recognition [19] I. Laptev. On space-time interest points, ICCV, pp.2556-2563107, 2005.

I. Laptev, M. Marsza?ek, C. Schmid, and B. Rozenfeld, Learning realistic human actions from movies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587756

URL : https://hal.archives-ouvertes.fr/inria-00548659

J. Liu, J. Luo, and M. Shah, Recognizing realistic actions from videos in the wild, CVPR, 2009.

M. Marsza?ek, I. Laptev, and C. Schmid, Actions in context, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206557

S. Mathe and C. Sminchisescu, Dynamic Eye Movement Datasets and Learnt Saliency Models for Visual Action Recognition, ECCV, 2012.
DOI : 10.1007/978-3-642-33709-3_60

J. C. Niebles, C. Chen, and L. Fei-fei, Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification, ECCV, 2010.
DOI : 10.1007/978-3-642-15552-9_29

A. Oliva and A. Torralba, Modeling the shape of the scene: A holistic representation of the spatial envelope, International Journal of Computer Vision, vol.42, issue.3, pp.145-175, 2001.
DOI : 10.1023/A:1011139631724

D. Oneata, J. Verbeek, and C. Schmid, Action and Event Recognition with Fisher Vectors on a Compact Feature Set, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.228

URL : https://hal.archives-ouvertes.fr/hal-00873662

D. Park, C. L. Zitnick, D. Ramanan, and P. Dollár, Exploring Weak Stabilization for Motion Feature Extraction, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.371

A. Patron-perez, M. Marszalek, I. Reid, and A. Zisserman, Structured Learning of Human Interactions in TV Shows, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.12, 2012.
DOI : 10.1109/TPAMI.2012.24

F. Perronnin, J. Sánchez, and T. Mensink, Improving the Fisher Kernel for Large-Scale Image Classification, ECCV, 2010.
DOI : 10.1007/978-3-642-15561-1_11

URL : https://hal.archives-ouvertes.fr/inria-00548630

A. Prest, C. Schmid, and V. Ferrari, Weakly Supervised Learning of Interactions between Humans and Objects, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.3, pp.601-614, 2012.
DOI : 10.1109/TPAMI.2011.158

URL : https://hal.archives-ouvertes.fr/inria-00516477

K. Reddy and M. Shah, Recognizing 50 human action categories of web videos. Machine Vision and Applications, pp.1-11, 2012.

S. Sadanand and J. J. Corso, Action bank: A high-level representation of activity in video, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6247806

P. Scovanner, S. Ali, and M. Shah, A 3-dimensional sift descriptor and its application to action recognition, Proceedings of the 15th international conference on Multimedia , MULTIMEDIA '07, 2007.
DOI : 10.1145/1291233.1291311

F. Shi, E. Petriu, and R. Laganiere, Sampling Strategies for Real-Time Action Recognition, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.335

J. Shi and C. Tomasi, Good features to track, CVPR, 1994.

B. Solmaz, S. M. Assari, and M. Shah, Classifying web videos using a global video descriptor, Machine Vision and Applications, pp.1-13, 2012.
DOI : 10.1007/s00138-012-0449-x

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.297.4388

R. Szeliski, Image Alignment and Stitching, pp.1-104, 2006.
DOI : 10.1007/0-387-28831-7_17

H. Uemura, S. Ishikawa, and K. Mikolajczyk, Feature Tracking and Motion Compensation for Action Recognition, Procedings of the British Machine Vision Conference 2008, 2008.
DOI : 10.5244/C.22.30

E. Vig, M. Dorr, and D. Cox, Space-Variant Descriptor Sampling for Action Recognition Based on Saliency and Eye Movements, ECCV, 2012.
DOI : 10.1007/978-3-642-33786-4_7

H. Wang, A. Kläser, C. Schmid, and C. Liu, Dense Trajectories and Motion Boundary Descriptors for Action Recognition, International Journal of Computer Vision, vol.73, issue.2, pp.60-79, 2013.
DOI : 10.1007/s11263-012-0594-8

URL : https://hal.archives-ouvertes.fr/hal-00725627

G. Willems, T. Tuytelaars, and L. Gool, An efficient dense and scaleinvariant spatio-temporal interest point detector, ECCV, 2008.

S. Wu, O. Oreifej, and M. Shah, Action recognition in videos acquired by a moving camera using motion decomposition of Lagrangian particle trajectories, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126397

L. Yeffet and L. Wolf, Local Trinary Patterns for human action recognition, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459201