M. Abadi, TensorFlow: Large-scale machine learning on heterogeneous systems, 2015.

F. Baradel, C. Wolf, and J. Mille, Human action recognition: Pose-based attention draws focus to hands, 2017 IEEE International Conference on Computer Vision Workshops (IC-CVW), pp.604-613, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01575390

F. Baradel, C. Wolf, and J. Mille, Human activity recognition with pose-driven attention to rgb, The British Machine Vision Conference (BMVC), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01828083

F. Baradel, C. Wolf, J. Mille, and G. W. Taylor, Glimpse clouds: Human activity recognition from unstructured feature points, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01713109

J. Carreira and A. Zisserman, Quo vadis, action recognition? a new model and the kinetics dataset, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4724-4733, 2017.

G. Cheron, I. Laptev, and C. Schmid, P-cnn: Pose-based cnn features for action recognition, ICCV, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01187690

F. Chollet, , 2015.

S. Das, A. Chaudhary, F. Bremond, and M. Thonnat, Where to focus on for human action recognition?, 2019.
URL : https://hal.archives-ouvertes.fr/hal-01927432

, IEEE Winter Conference on Applications of Computer Vision (WACV), pp.71-80, 2019.

J. Donahue, L. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan et al., Long-term recurrent convolutional networks for visual recognition and description, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

A. Gupta, J. Martinez, J. J. Little, and R. J. Woodham, 3d pose from motion for cross-view action recognition via nonlinear circulant temporal encoding, IEEE Conference on Computer Vision and Pattern Recognition, pp.2601-2608, 2014.

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput, vol.9, issue.8, pp.1735-1780, 1997.

J. F. Hu, W. S. Zheng, J. Lai, and J. Zhang, Jointly learning heterogeneous features for rgb-d activity recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, issue.11, pp.2186-2200, 2017.

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, 2014.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, NIPS, 2012.

I. Lee, D. Kim, S. Kang, and S. Lee, Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks, Proceedings of the IEEE International Conference on Computer Vision, 2017.

J. Liu, G. Wang, P. Hu, L. Duan, and A. C. Kot, Global context-aware attention lstm networks for 3d action recognition, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3671-3680, 2017.

M. Liu, H. Liu, and C. Chen, Enhanced skeleton visualization for view invariant human action recognition, Pattern Recognition, vol.68, pp.346-362, 2017.

M. Liu and J. Yuan, Recognizing human actions as the evolution of pose estimation maps, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

F. Perronnin, J. Sánchez, and T. Mensink, Improving the fisher kernel for large-scale image classification, European conference on computer vision, pp.143-156, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00548630

H. Rahmani and A. Mian, Learning a non-linear knowledge transfer model for cross-view action recognition, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2458-2466, 2015.

H. Rahmani and A. Mian, 3d action recognition from novel viewpoints, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1506-1515, 2016.

G. Rogez, P. Weinzaepfel, and C. Schmid, Lcr-net: Localization-classification-regression for human pose, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1216-1224, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01505085

A. Shahroudy, J. Liu, T. Ng, and G. Wang, Ntu rgb+d: A large scale dataset for 3d human activity analysis, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

A. Shahroudy, T. T. Ng, Y. Gong, and G. Wang, Deep multimodal feature analysis for action recognition in rgb+d videos, IEEE Transactions on Pattern Analysis and Machine Intelligence, issue.99, pp.1-1, 2017.

J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio et al., Real-time human pose recognition in parts from single depth images, CVPR, 2011.

S. Song, N. Cheung, V. Chandrasekhar, and B. Mandal, Deep adaptive temporal pooling for activity recognition, Proceedings of the 26th ACM International Conference on Multimedia, MM '18, pp.1829-1837, 2018.

S. Song, C. Lan, J. Xing, W. Zeng, and J. Liu, An end-toend spatio-temporal attention model for human action recognition from skeleton data, AAAI Conference on Artificial Intelligence, pp.4263-4270, 2017.

K. Soomro, A. R. Zamir, and M. Shah, Ucf101: A dataset of 101 human actions classes from videos in the wild, vol.12, 2012.

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, Learning spatiotemporal features with 3d convolutional networks, Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), ICCV '15, pp.4489-4497, 2015.

G. Varol, I. Laptev, and C. Schmid, Long-term temporal convolutions for action recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.40, issue.6, pp.1510-1517, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01241518

H. Wang, A. Kläser, C. Schmid, and C. Liu, Action Recognition by Dense Trajectories, IEEE Conference on
URL : https://hal.archives-ouvertes.fr/inria-00583818

, Computer Vision & Pattern Recognition, pp.3169-3176, 2011.

H. Wang and C. Schmid, Action recognition with improved trajectories, IEEE International Conference on Computer Vision, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00873267

J. Wang, X. Nie, Y. Xia, Y. Wu, and S. Zhu, Cross-view action modeling, learning, and recognition, IEEE Conference on Computer Vision and Pattern Recognition, pp.2649-2656, 2014.

L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin et al., Temporal segment networks: Towards good practices for deep action recognition, ECCV, 2016.

X. Wang, R. B. Girshick, A. Gupta, and K. He, Non-local neural networks, IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.7794-7803, 2018.

P. Zhang, C. Lan, J. Xing, W. Zeng, J. Xue et al., View adaptive recurrent neural networks for high performance human action recognition from skeleton data, The IEEE International Conference on Computer Vision (ICCV), 2017.

B. Zhou, A. Andonian, A. Oliva, and A. Torralba, Temporal relational reasoning in videos, Computer Vision -ECCV 2018 -15th European Conference, pp.831-846, 2018.

M. Zolfaghari, G. L. Oliveira, N. Sedaghat, and T. Brox, Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection, 2017 IEEE International Conference on, pp.2923-2932, 2017.