F. Baradel, C. Wolf, and J. Mille, Human action recognition: Pose-based attention draws focus to hands, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pp.604-613, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01575390

F. Baradel, C. Wolf, and J. Mille, Human activity recognition with pose-driven attention to rgb
URL : https://hal.archives-ouvertes.fr/hal-01828083

, The British Machine Vision Conference (BMVC), 2018.

F. Baradel, C. Wolf, J. Mille, and G. W. Taylor, Glimpse clouds: Human activity recognition from unstructured feature points, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01713109

J. Carreira and A. Zisserman, Quo vadis, action recognition? a new model and the kinetics dataset, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4724-4733, 2017.

G. Cheron, I. Laptev, and C. Schmid, P-cnn: Pose-based cnn features for action recognition, ICCV, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01187690

D. Damen, H. Doughty, G. M. Farinella, S. Fidler, A. Furnari et al., Scaling egocentric vision: The EPIC-KITCHENS dataset, 2018.

K. Darpa, Virat video dataset, 2019.

S. Das, M. Koperski, F. Brémond, and G. Francesca, Deep-temporal lstm for daily living action recognition, 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp.1-6, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01896064

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., ImageNet: A Large-Scale Hierarchical Image Database, CVPR09, 2009.

J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan et al., Long-term recurrent convolutional networks for visual recognition and description, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

F. Faugeras and L. Naccache, Dissociating temporal attention from spatial attention and motor response preparation: A high-density eeg study, NeuroImage, vol.124, pp.947-957, 2016.

C. Feichtenhofer, H. Fan, J. Malik, and K. He, Slowfast networks for video recognition

. Corr, , 2018.

R. Girdhar, J. Carreira, C. Doersch, and A. Zisserman, , 2018.

R. Goyal, S. E. Kahou, V. Michalski, J. Materzynska, S. Westphal et al., The "something something" video database for learning and evaluating visual common sense, 2017.

C. Gu, C. Sun, D. A. Ross, C. Vondrick, C. Pantofaru et al., Ava: A video dataset of spatio-temporally localized atomic visual actions, Conference on Computer Vision and Pattern Recognition(CVPR), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01764300

A. Gupta, J. Martinez, J. J. Little, and R. J. Woodham, 3d pose from motion for cross-view action recognition via non-linear circulant temporal encoding, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.2601-2608, 2014.

J. Hu, W. Zheng, J. Lai, and J. Zhang, Jointly learning heterogeneous features for rgb-d activity recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, issue.11, pp.2186-2200, 2017.

P. Hu and D. Ramanan, Finding tiny faces, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

P. Diederik, J. Kingma, and . Ba, Adam: A method for stochastic optimization, 2014.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems, pp.1097-1105, 2012.

H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, Hmdb: a large video database for human motion recognition, 2011 International Conference on Computer Vision, pp.2556-2563, 2011.

I. Lee, D. Kim, S. Kang, and S. Lee, Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks, Proceedings of the IEEE International Conference on Computer Vision, 2017.

J. Liu, A. Shahroudy, D. Xu, and G. Wang, Spatio-temporal lstm with trust gates for 3d human action recognition, Computer Vision -ECCV 2016, pp.816-833, 2016.

M. Liu, H. Liu, and C. Chen, Enhanced skeleton visualization for view invariant human action recognition, Pattern Recognition, vol.68, pp.346-362, 2017.

M. Liu and J. Yuan, Recognizing human actions as the evolution of pose estimation maps, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

B. Mahasseni and S. Todorovic, Regularizing long short term memory with 3d human-skeleton sequences for action recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3054-3062, 2016.

F. Perronnin, J. Sánchez, and T. Mensink, Improving the fisher kernel for large-scale image classification, European conference on computer vision, pp.143-156, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00548630

H. Rahmani and A. Mian, Learning a non-linear knowledge transfer model for cross-view action recognition, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2458-2466, 2015.

H. Rahmani and A. Mian, 3d action recognition from novel viewpoints, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1506-1515, 2016.

G. Rogez, P. Weinzaepfel, and C. Schmid, LCR-Net++: Multi-person 2D and 3D Pose Detection in Natural Images, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.
URL : https://hal.archives-ouvertes.fr/hal-01961189

M. Rohrbach, A. Rohrbach, M. Regneri, S. Amin, M. Andriluka et al., Recognizing fine-grained and composite activities using hand-centric features and script data, International Journal of Computer Vision, pp.1-28, 2015.

X. Seyed-morteza-safdarnejad, L. Liu, B. Udpa, J. Andrus, D. Wood et al., Sports videos in the wild (svw): A video dataset for sports analysis, 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), vol.1, pp.1-7, 2015.

A. Shahroudy, J. Liu, T. Ng, and G. Wang, Ntu rgb+d: A large scale dataset for 3d human activity analysis, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

S. Sharma, R. Kiros, and R. Salakhutdinov, Action recognition using visual attention, 2015.

A. Gunnar, G. Sigurdsson, X. Varol, A. Wang, I. Farhadi et al., Hollywood in homes: Crowdsourcing data collection for activity understanding, European Conference on Computer Vision(ECCV), 2016.

K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, Advances in neural information processing systems, pp.568-576, 2014.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2014.

S. Song, C. Lan, J. Xing, W. Zeng, and J. Liu, An end-to-end spatio-temporal attention model for human action recognition from skeleton data, AAAI Conference on Artificial Intelligence, pp.4263-4270, 2017.

K. Soomro, M. Amir-roshan-zamir, and . Shah, Ucf101: A dataset of 101 human actions classes from videos in the wild, vol.12, 2012.

J. Sung, B. Ponce, and A. Saxena, Human activity detection from rgbd images, AAAI workshop, 2011.

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, Learning spatiotemporal features with 3d convolutional networks, Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), ICCV '15, pp.4489-4497, 2015.

G. Vaquette, A. Orcesi, L. Lucat, and C. Achard, The daily home life activity dataset: a high semantic activity dataset for online recognition, 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pp.497-504, 2017.
URL : https://hal.archives-ouvertes.fr/cea-01841019

H. Wang, A. Kläser, C. Schmid, and C. Liu, Action Recognition by Dense Trajectories, IEEE Conference on Computer Vision & Pattern Recognition, pp.3169-3176, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00583818

H. Wang and C. Schmid, Action recognition with improved trajectories, IEEE International Conference on Computer Vision, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00873267

J. Wang, Z. Liu, Y. Wu, and J. Yuan, Mining actionlet ensemble for action recognition with depth cameras, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2012.

J. Wang, X. Nie, Y. Xia, Y. Wu, and S. Zhu, Cross-view action modeling, learning, and recognition, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.2649-2656, 2014.

X. Wang, R. B. Girshick, A. Gupta, and K. He, Non-local neural networks, IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.7794-7803, 2018.

P. Zhang, C. Lan, J. Xing, W. Zeng, J. Xue et al., View adaptive recurrent neural networks for high performance human action recognition from skeleton data, The IEEE International Conference on Computer Vision (ICCV), 2017.

Z. Zhang, Microsoft kinect sensor and its effect, IEEE MultiMedia, vol.19, 2012.