B. Tversky, J. Morrison, and J. Zacks, On bodies and events, The Imitative Mind, 2002.
DOI : 10.1017/CBO9780511489969.013

I. Laptev, M. Marsza?ek, C. Schmid, and B. Rozenfeld, Learning realistic human actions from movies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587756

URL : https://hal.archives-ouvertes.fr/inria-00548659

J. C. Niebles, H. Wang, and L. Fei-fei, Unsupervised learning of human action categories using spatial-temporal words, pp.299-318, 2008.

C. Schüldtsch¨schüldt, I. Laptev, and B. Caputo, Recognizing human actions: a local SVM approach, ICPR, 2004.

H. Wang and C. Schmid, Action Recognition with Improved Trajectories, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.441

URL : https://hal.archives-ouvertes.fr/hal-00873267

K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, NIPS, 2014.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, NIPS, 2012.
DOI : 10.1162/neco.2009.10-08-881

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.299.205

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., ImageNet: A large-scale hierarchical image database, CVPR, 2009.

B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva, Learning deep features for scene recognition using places database, NIPS, 2014.

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.81

URL : http://arxiv.org/abs/1311.2524

Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, DeepFace: Closing the Gap to Human-Level Performance in Face Verification, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.220

A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar et al., Large-Scale Video Classification with Convolutional Neural Networks, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.223

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.471.3312

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, Learning Spatiotemporal Features with 3D Convolutional Networks, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.510

URL : http://arxiv.org/abs/1412.0767

J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan et al., Long-term recurrent convolutional networks for visual recognition and description, CVPR, 2015.
DOI : 10.1109/tpami.2016.2599174

URL : http://arxiv.org/abs/1411.4389

S. Ji, W. Xu, M. Yang, and K. Yu, 3D Convolutional Neural Networks for Human Action Recognition, ICML, 2010.
DOI : 10.1109/TPAMI.2012.59

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.169.4046

G. W. Taylor, R. Fergus, Y. Lecun, and C. Bregler, Convolutional Learning of Spatio-temporal Features, ECCV, 2010.
DOI : 10.1007/978-3-642-15567-3_11

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.178.9267

G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray, Visual categorization with bags of keypoints, ECCVW, 2004.

F. Perronnin, J. Sánchez, and T. Mensink, Improving the Fisher Kernel for Large-Scale Image Classification, ECCV, 2010.
DOI : 10.1007/978-3-642-15561-1_11

URL : https://hal.archives-ouvertes.fr/inria-00548630

B. Fernando, E. Gavves, J. Oramas, A. Ghodrati, and T. Tuytelaars, Modeling video evolution for action recognition, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7299176

Y. Lecun, B. Boser, J. S. Denker, D. Henderson, R. Howard et al., Backpropagation Applied to Handwritten Zip Code Recognition, Neural Computation, vol.1, issue.4, pp.541-551, 1989.
DOI : 10.1007/BF00133697

L. Wang, Y. Qiao, and X. Tang, Action recognition with trajectorypooled deep-convolutional descriptors, CVPR, 2015.
DOI : 10.1109/cvpr.2015.7299059

URL : http://arxiv.org/abs/1505.04868

L. Wang, Y. Xiong, Z. Wang, and Y. Qiao, Towards good practices for very deep two-stream convnets, 2015.
DOI : 10.1007/978-3-319-46484-8_2

URL : http://arxiv.org/abs/1608.00859

H. Bilen, B. Fernando, E. Gavves, A. Vedaldi, and S. Gould, Dynamic Image Networks for Action Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.331

C. Feichtenhofer, A. Pinz, and A. Zisserman, Convolutional twostream network fusion for video action recognition, CVPR, 2016.
DOI : 10.1109/cvpr.2016.213

URL : http://arxiv.org/abs/1604.06573

B. Zhang, L. Wang, Z. Wang, Y. Qiao, and H. Wang, Real-Time Action Recognition with Enhanced Motion Vector CNNs, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.297

URL : http://arxiv.org/abs/1604.07669

V. Kantorov and I. Laptev, Efficient Feature Extraction, Encoding, and Classification for Action Recognition, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.332

URL : https://hal.archives-ouvertes.fr/hal-01058734

G. Farnebäck, Two-Frame Motion Estimation Based on Polynomial Expansion, SCIA, 2003.
DOI : 10.1007/3-540-45103-X_50

T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, High Accuracy Optical Flow Estimation Based on a Theory for Warping, ECCV, 2004.
DOI : 10.1007/978-3-540-24673-2_3

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.1732

K. Soomro, A. Zamir, and M. Shah, UCF101: A dataset of 101 human actions classes from videos in the wild, 2012.

H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, HMDB: A large video database for human motion recognition, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126543

URL : http://cbcl.mit.edu/publications/ps/Kuehne_etal_iccv11.pdf

Z. Lan, M. Lin, X. Li, A. G. Hauptmann, and B. Raj, Beyond Gaussian pyramid: Multi-skip feature stacking for action recognition, CVPR, 2015.

J. Y. Ng, M. J. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga et al., Beyond short snippets: Deep networks for video classification, CVPR, 2015.

M. D. Zeiler and R. Fergus, Visualizing and Understanding Convolutional Networks, ECCV, 2014.
DOI : 10.1007/978-3-319-10590-1_53

URL : http://arxiv.org/abs/1311.2901