W. Brendel and S. Todorovic, Activities as Time Series of Human Postures, ECCV, 2010.
DOI : 10.1007/978-3-642-15552-9_52

L. Cao, Y. Mu, A. Natsev, S. Chang, G. Hua et al., Scene Aligned Pooling for Complex Video Recognition, ECCV, 2012.
DOI : 10.1007/978-3-642-33709-3_49

K. Chatfield, V. Lempitsky, A. Vedaldi, and A. Zisserman, The devil is in the details: an evaluation of recent feature encoding methods, Procedings of the British Machine Vision Conference 2011, 2011.
DOI : 10.5244/C.25.76

O. Duchenne, I. Laptev, J. Sivic, F. Bach, and J. Ponce, Automatic annotation of human actions in video, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459279

M. Everingham, L. Van-gool, C. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, pp.303-338, 2010.
DOI : 10.1007/s11263-009-0275-4

A. Gaidon, Z. Harchaoui, and C. Schmid, Actom sequence models for efficient action detection, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995646

URL : https://hal.archives-ouvertes.fr/inria-00575217

A. Gaidon, Z. Harchaoui, and C. Schmid, Recognizing activities with cluster-trees of tracklets, Procedings of the British Machine Vision Conference 2012, 2012.
DOI : 10.5244/C.26.30

URL : https://hal.archives-ouvertes.fr/hal-00722955

A. Gupta, A. Kembhavi, and L. Davis, Observing humanobject interactions: using spatial and functional compatibility for recognition, pp.311775-1789, 2009.

N. Ikizler-cinbis and S. Sclaroff, Object, Scene and Actions: Combining Multiple Features for Human Action Recognition, ECCV, 2010.
DOI : 10.1007/978-3-642-15549-9_36

H. Izadinia and M. Shah, Recognizing Complex Events Using Large Margin Joint Low-Level Event Model, ECCV, 2012.
DOI : 10.1007/978-3-642-33765-9_31

M. Jain, H. Jégou, and P. Bouthemy, Better Exploiting Motion for Better Action Recognition, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.330

URL : https://hal.archives-ouvertes.fr/hal-00813014

Y. Jiang, Q. Dai, X. Xue, W. Liu, and C. Ngo, Trajectory-Based Modeling of Human Actions with Motion Reference Points, ECCV, 2012.
DOI : 10.1007/978-3-642-33715-4_31

A. Kläser, M. Marsza?ek, C. Schmid, and A. Zisserman, Human Focused Action Localization in Video, ECCV Workshop on Sign, Gesture, and Activity, 2010.
DOI : 10.1007/978-3-642-35749-7_17

O. Kliper-gross, Y. Gurovich, T. Hassner, and L. Wolf, Motion Interchange Patterns for Action Recognition in Unconstrained Videos, ECCV, 2012.
DOI : 10.1007/978-3-642-33783-3_19

J. Krapac, J. Verbeek, and F. Jurie, Modeling spatial layout with fisher vectors for image categorization, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126406

URL : https://hal.archives-ouvertes.fr/inria-00612277

H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, HMDB: A large video database for human motion recognition, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126543

URL : http://cbcl.mit.edu/publications/ps/Kuehne_etal_iccv11.pdf

I. Laptev, M. Marsza?ek, C. Schmid, and B. Rozenfeld, Learning realistic human actions from movies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587756

URL : https://hal.archives-ouvertes.fr/inria-00548659

I. Laptev and P. Pérez, Retrieving actions in movies, 2007 IEEE 11th International Conference on Computer Vision, 2007.
DOI : 10.1109/ICCV.2007.4409105

S. Lazebnik, C. Schmid, and J. Ponce, Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), 2006.
DOI : 10.1109/CVPR.2006.68

URL : https://hal.archives-ouvertes.fr/inria-00548585

Q. Le, W. Zou, S. Yeung, and A. Ng, Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995496

J. Liu, J. Luo, and M. Shah, Recognizing realistic actions from videos, CVPR, 2009.

D. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, vol.60, issue.2, pp.91-110, 2004.
DOI : 10.1023/B:VISI.0000029664.99615.94

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.4931

M. Marszalek, I. Laptev, and C. Schmid, Actions in context, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206557

URL : https://hal.archives-ouvertes.fr/inria-00548645

S. Mathe, C. Sminchisescu, and R. Sukthankar, Dynamic eye movement datasets and learnt saliency models for visual action recognition Representing pairwise spatial and temporal relations for action recognition, ECCV, 2012. [25] P. Matikainen, M. Hebert, ECCV, 2010.

S. Mccann and D. Lowe, Spatially Local Coding for Object Recognition, ACCV, 2012.
DOI : 10.1007/978-3-642-37331-2_16

P. Natarajan, S. Wu, S. Vitaladevuni, X. Zhuang, S. Tsakalidis et al., Multimodal feature fusion for robust event detection in web videos, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6247814

J. Niebles, C. Chen, and F. Li, Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification, ECCV, 2010.
DOI : 10.1007/978-3-642-15552-9_29

P. Over, G. Awad, M. Michel, J. Fiscus, G. Sanders et al., TRECVID 2012 ? an overview of the goals, tasks, data, evaluation mechanisms and metrics, Proceedings of TRECVID, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00953826

F. Perronnin, J. Sánchez, and Y. Liu, Large-scale image categorization with explicit data embedding, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5539914

A. Prest, V. Ferrari, and C. Schmid, Explicit Modeling of Human-Object Interactions in Realistic Videos, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.4, pp.835-848, 2013.
DOI : 10.1109/TPAMI.2012.175

URL : https://hal.archives-ouvertes.fr/hal-00720847

L. Rabiner and R. Schafer, Introduction to Digital Speech Processing, Foundations and Trends?? in Signal Processing, vol.1, issue.1???2, pp.1-194, 2007.
DOI : 10.1561/2000000001

K. Reddy and M. Shah, Recognizing 50 human action categories of web videos, Machine Vision and Applications Journal, 2012.
DOI : 10.1007/s00138-012-0450-4

J. Sánchez, F. Perronnin, and T. De-campos, Modeling the spatial layout of images beyond spatial pyramids, Pattern Recognition Letters, vol.33, issue.16, pp.2216-2223, 2012.
DOI : 10.1016/j.patrec.2012.07.019

J. Sanchez, F. Perronnin, T. Mensink, and J. Verbeek, Image Classification with the Fisher Vector: Theory and Practice, International Journal of Computer Vision, vol.73, issue.2, pp.222-245, 2013.
DOI : 10.1007/s11263-013-0636-x

URL : https://hal.archives-ouvertes.fr/hal-00779493

M. Sapienza, F. Cuzzolin, and P. Torr, Learning discriminative space-time actions from weakly labelled videos, BMVC, 2012.
DOI : 10.5244/c.26.123

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.301.6470

C. Schuldt, I. Laptev, and B. Caputo, Recognizing human actions: a local SVM approach, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., 2004.
DOI : 10.1109/ICPR.2004.1334462

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.173.6790

J. Sivic and A. Zisserman, Video Google: a text retrieval approach to object matching in videos, Proceedings Ninth IEEE International Conference on Computer Vision, 2003.
DOI : 10.1109/ICCV.2003.1238663

C. Sun and R. Nevatia, Large-scale web video event classification by use of Fisher Vectors, 2013 IEEE Workshop on Applications of Computer Vision (WACV), 2013.
DOI : 10.1109/WACV.2013.6474994

K. Tang, L. Fei-fei, and D. Koller, Learning latent temporal structure for complex event detection, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6247808

H. Wang, A. Kläser, C. Schmid, and C. Liu, Dense Trajectories and Motion Boundary Descriptors for Action Recognition, International Journal of Computer Vision, vol.73, issue.2, 2013.
DOI : 10.1007/s11263-012-0594-8

URL : https://hal.archives-ouvertes.fr/hal-00725627

H. Wang and C. Schmid, Action Recognition with Improved Trajectories, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.441

URL : https://hal.archives-ouvertes.fr/hal-00873267

H. Wang, M. Ullah, A. Kläser, I. Laptev, and C. Schmid, Evaluation of local spatio-temporal features for action recognition, Procedings of the British Machine Vision Conference 2009, 2009.
DOI : 10.5244/C.23.124

URL : https://hal.archives-ouvertes.fr/inria-00439769

X. Wang, L. Wang, and Y. Qiao, A Comparative Study of Encoding, Pooling and Normalization Methods for Action Recognition, ACCV, 2012.
DOI : 10.1007/978-3-642-37431-9_44

Y. Yang and M. Shah, Complex events detection using datadriven concepts, ECCV, 2012.
DOI : 10.1007/978-3-642-33712-3_52

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.258.6996