K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, Advances in Neural Information Processing Systems, pp.568-576, 2014.

J. Wang, Z. Liu, Y. Wu, and J. Yuan, Learning actionlet ensemble for 3d human action recognition, IEEE Conference on Computer Vision and Pattern Recognition, pp.1290-1297, 2012.

K. Soomro, A. R. Zamir, and M. Shah, Ucf101: A dataset of 101 human actions classes from videos in the wild, In: Computer Science, 2012.

H. Fujiyoshi and A. J. Lipton, Real-time human motion analysis by image skeletonization, Proceedings Fourth IEEE Workshop on Applications of Computer Vision. WACV'98 (Cat. No.98EX201), p.15, 2002.
DOI : 10.1109/ACV.1998.732852

P. Wei, N. Zheng, Y. Zhao, and S. C. Zhu, Concurrent Action Detection with Structural Prediction, 2013 IEEE International Conference on Computer Vision, pp.3136-3143, 2013.
DOI : 10.1109/ICCV.2013.389

R. Chaudhry, A. Ravichandran, G. Hager, and R. Vidal, Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions, 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp.1932-1939, 2009.
DOI : 10.1109/CVPR.2009.5206821

N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp.886-893, 2005.
DOI : 10.1109/CVPR.2005.177

URL : https://hal.archives-ouvertes.fr/inria-00548512

D. G. Lowe, Object recognition from local scale-invariant features, Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999.
DOI : 10.1109/ICCV.1999.790410

C. Sch, I. Laptev, and B. Caputo, Recognizing human actions: A local svm approach, International Conference on Pattern Recognition, pp.32-36, 2004.

H. Wang, A. Kl?ser, C. Schmid, and C. L. Liu, Action recognition by dense trajectories, CVPR 2011, pp.3169-3176, 2011.
DOI : 10.1109/CVPR.2011.5995407

URL : https://hal.archives-ouvertes.fr/inria-00583818

A. Kl?ser, M. Marszalek, and C. Schmid, A Spatio-Temporal Descriptor Based on 3D-Gradients, Procedings of the British Machine Vision Conference 2008, 2008.
DOI : 10.5244/C.22.99

C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, vol.1, issue.3, pp.273-297, 1995.
DOI : 10.1007/BF00994018

J. Schmidhuber, Deep learning in neural networks: An overview, Neural networks: The Official Journal of the International Neural Network Society, pp.61-85, 2014.
DOI : 10.1016/j.neunet.2014.09.003

URL : http://arxiv.org/pdf/1404.7828

I. Goodfellow, Y. Bengio, and A. Courville, , 2016.

Y. Lecun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol.9, issue.7553, pp.436-444, 2015.
DOI : 10.1007/s10994-013-5335-x

S. Ji, W. Xu, M. Yang, and K. Yu, 3D Convolutional Neural Networks for Human Action Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.1, pp.221-231, 2012.
DOI : 10.1109/TPAMI.2012.59

X. Chen, J. Weng, W. Lu, J. Xu, and J. Weng, Deep manifold learning combined with convolutional neural networks for action recognition, IEEE Transactions on Neural Networks & Learning Systems, issue.99, pp.1-15, 2017.

C. Li, S. Sun, X. Min, W. Lin, B. Nie et al., End-to-end learning of deep convolutional neural network for 3d human action recognition, IEEE International Conference on Multimedia & Expo Workshops, pp.609-612, 2017.

H. Rahmani, A. Mian, and M. Shah, Learning a Deep Model for Human Action Recognition from Novel Viewpoints, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.40, issue.3, pp.667-681, 2018.
DOI : 10.1109/TPAMI.2017.2691768

F. Husain, B. Dellen, and C. Torras, Action Recognition Based on Efficient Deep Feature Learning in the Spatio-Temporal Domain, IEEE Robotics and Automation Letters, vol.1, issue.2, p.984, 2016.
DOI : 10.1109/LRA.2016.2529686

S. V. Mora and W. J. Knottenbelt, Deep Learning for Domain-Specific Action Recognition in Tennis, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp.170-178, 2017.
DOI : 10.1109/CVPRW.2017.27

N. Papenberg, A. Bruhn, T. Brox, S. Didas, and J. Weickert, Highly Accurate Optic Flow Computation with Theoretically Justified Warping, International Journal of Computer Vision, vol.14, issue.3, pp.141-158, 2006.
DOI : 10.1002/0471725250

S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Computation, vol.4, issue.8, pp.1735-1780, 1997.
DOI : 10.1016/0893-6080(88)90007-X

J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan et al., Long-term recurrent convolutional networks for visual recognition and description, In: Computer Vision and Pattern Recognition, pp.677-691, 2015.

M. Baccouche, F. Mamalet, C. Wolf, C. Garcia, and A. Baskurt, Sequential Deep Learning for Human Action Recognition, Human Behavior Understanding, pp.29-39, 2011.
DOI : 10.1007/978-3-642-25446-8_4

URL : https://hal.archives-ouvertes.fr/hal-01354493

Y. H. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga et al., Beyond short snippets: deep networks for video classification, pp.16-4694, 2015.

A. Graves, Supervised sequence labelling with recurrent neural networks, 2012.
DOI : 10.1007/978-3-642-24797-2

A. Barbu, A. Bridge, Z. Burchill, D. Coroian, S. Dickinson et al., Video in sentences out, pp.1401-274, 2012.

Z. W. Yuan and J. Zhang, Feature extraction and image retrieval based on alexnet, Eighth International Conference on Digital Image Processing, 2016.

S. Baker, S. Roth, D. Scharstein, M. J. Black, J. P. Lewis et al., A database and evaluation methodology for optical flow, IEEE International Conference on Computer Vision, pp.1-31, 2007.
DOI : 10.1109/iccv.2007.4408903

URL : http://www.cs.brown.edu/people/black/Papers/ofevaltr.pdf

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long et al., Caffe, Proceedings of the ACM International Conference on Multimedia, MM '14, pp.675-678, 2014.
DOI : 10.1145/2647868.2654889

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25, pp.1097-1105, 2012.

M. Müller and T. R?der, Motion templates for automatic classification and retrieval of motion capture data, ACM Siggraph/eurographics Symposium on Computer Animation, pp.137-146, 2006.

J. Wang, Z. Liu, Y. Wu, and J. Yuan, Learning Actionlet Ensemble for 3D Human Action Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.36, issue.5, p.914, 2014.
DOI : 10.1109/TPAMI.2013.198

P. Wang, W. Li, Z. Gao, J. Zhang, C. Tang et al., Deep convolutional neural networks for action recognition using depth map sequences, In: Computer Science, 2015.

P. Wei, Y. Zhao, N. Zheng, and S. C. Zhu, Modeling 4D Human-Object Interactions for Joint Event Segmentation, Recognition, and Object Localization, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, issue.6, pp.1165-1179, 2017.
DOI : 10.1109/TPAMI.2016.2574712