M. Abadi, TensorFlow: Large-scale machine learning on heterogeneous systems, 2015.

F. Baradel, C. Wolf, and J. Mille, Human action recognition: Pose-based attention draws focus to hands, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pp.604-613, 2017.
DOI : 10.1109/iccvw.2017.77
URL : https://hal.archives-ouvertes.fr/hal-01575390

F. Baradel, C. Wolf, J. Mille, and G. W. Taylor, Glimpse clouds: Human activity recognition from unstructured feature points, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
DOI : 10.1109/cvpr.2018.00056
URL : https://hal.archives-ouvertes.fr/hal-01713109

J. Carreira and A. Zisserman, Quo vadis, action recognition? a new model and the kinetics dataset, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4724-4733, 2017.
DOI : 10.1109/cvpr.2017.502
URL : http://arxiv.org/pdf/1705.07750

K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman, Return of the devil in the details: Delving deep into convolutional nets, British Machine Vision Conference, 2014.

G. Cheron, I. Laptev, and C. Schmid, P-cnn: Pose-based cnn features for action recognition, ICCV, 2015.
DOI : 10.1109/iccv.2015.368
URL : https://hal.archives-ouvertes.fr/hal-01187690

F. Chollet, , 2015.

S. Das, M. Koperski, F. Bremond, and G. Francesca, A Fusion of Appearance based CNNs and Temporal evolution of Skeleton with LSTM for Daily Living Action Recognition, 2018.

J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan et al., Long-term recurrent convolutional networks for visual recognition and description, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/tpami.2016.2599174
URL : https://doi.org/10.1109/tpami.2016.2599174

C. Feichtenhofer, A. Pinz, and A. Zisserman, Convolutional two-stream network fusion for video action recognition, Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on, pp.1933-1941, 2016.
DOI : 10.1109/cvpr.2016.213
URL : http://arxiv.org/pdf/1604.06573

E. G. , S. G. , and H. R. , Skeletal quads: Human action recognition using joint quadruples, 22nd International Conference on Pattern Recognition, pp.4513-4518, 2014.

A. Gupta, J. Martinez, J. J. Little, and R. J. Woodham, 3d pose from motion for cross-view action recognition via nonlinear circulant temporal encoding, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.2601-2608, 2014.
DOI : 10.1109/cvpr.2014.333

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput, vol.9, issue.8, pp.1735-1780, 1997.

D. Hogg, Model-based vision: a program to see a walking person, Image and Vision Computing, vol.1, issue.1, pp.5-20, 1983.
DOI : 10.1016/0262-8856(83)90003-3

J. Hu, W. Zheng, J. Lai, and J. Zhang, Jointly learning heterogeneous features for RGB-D activity recognition, CVPR, 2015.
DOI : 10.1109/tpami.2016.2640292
URL : http://discovery.dundee.ac.uk/ws/files/11155200/PAMI_2017_JZhang.pdf

J. F. Hu, W. S. Zheng, J. Lai, and J. Zhang, Jointly learning heterogeneous features for rgb-d activity recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, issue.11, pp.2186-2200, 2017.
DOI : 10.1109/tpami.2016.2640292
URL : http://discovery.dundee.ac.uk/ws/files/11155200/PAMI_2017_JZhang.pdf

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, 2014.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, NIPS, 2012.

I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, Learning realistic human actions from movies, Computer Vision and Pattern Recognition, pp.1-8, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00548659

I. Lee, D. Kim, S. Kang, and S. Lee, Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks, Proceedings of the IEEE International Conference on Computer Vision, 2017.

B. Li, O. I. Camps, and M. Sznaier, Cross-view activity recognition using hankelets, 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.1362-1369, 2012.

R. Li and T. Zickler, Discriminative virtual views for crossview action recognition, 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.2855-2862, 2012.

J. Liu, A. Shahroudy, D. Xu, and G. Wang, Spatio-temporal lstm with trust gates for 3d human action recognition, Computer Vision-ECCV 2016, pp.816-833, 2016.

J. Liu, G. Wang, P. Hu, L. Duan, and A. C. Kot, Global context-aware attention lstm networks for 3d action recognition, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3671-3680, 2017.

M. Liu, H. Liu, and C. Chen, Enhanced skeleton visualization for view invariant human action recognition, Pattern Recognition, vol.68, pp.346-362, 2017.

M. Liu and J. Yuan, Recognizing human actions as the evolution of pose estimation maps, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

V. Mnih, N. Heess, A. Graves, and K. Kavukcuoglu, Recurrent models of visual attention, Proceedings of the 27th International Conference on Neural Information Processing Systems, vol.2, pp.2204-2212, 2014.

F. Perronnin, J. Sánchez, and T. Mensink, Improving the fisher kernel for large-scale image classification, European conference on computer vision, pp.143-156, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00548630

V. R. , A. F. , and C. R. , Human action recognition by representing 3d skeletons as points in a lie group, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.588-595, 2014.

H. Rahmani and A. Mian, Learning a non-linear knowledge transfer model for cross-view action recognition, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2458-2466, 2015.

H. Rahmani and A. Mian, 3d action recognition from novel viewpoints, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1506-1515, 2016.

A. Shahroudy, J. Liu, T. Ng, and G. Wang, Ntu rgb+d: A large scale dataset for 3d human activity analysis, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

A. Shahroudy, T. T. Ng, Y. Gong, and G. Wang, Deep multimodal feature analysis for action recognition in rgb+d videos, IEEE Transactions on Pattern Analysis and Machine Intelligence, issue.99, pp.1-1, 2017.

S. Sharma, R. Kiros, and R. Salakhutdinov, Action recognition using visual attention, 2015.

K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, Advances in neural information processing systems, pp.568-576, 2014.

S. Song, C. Lan, J. Xing, W. Zeng, and J. Liu, An end-toend spatio-temporal attention model for human action recognition from skeleton data, AAAI Conference on Artificial Intelligence, pp.4263-4270, 2017.

K. Soomro, A. R. Zamir, and M. Shah, Ucf101: A dataset of 101 human actions classes from videos in the wild, vol.12, 2012.

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, Learning spatiotemporal features with 3d convolutional networks, Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), ICCV '15, pp.4489-4497, 2015.

F. Wang, M. Jiang, C. Qian, S. Yang, C. Li et al., Residual attention network for image classification, 2017.

H. Wang, A. Kläser, C. Schmid, and C. Liu, Action Recognition by Dense Trajectories, IEEE Conference on Computer Vision & Pattern Recognition, pp.3169-3176, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00583818

J. Wang, X. Nie, Y. Xia, Y. Wu, and S. Zhu, Cross-view action modeling, learning, and recognition, IEEE Conference on Computer Vision and Pattern Recognition, pp.2649-2656, 2014.

P. Wang, W. Li, C. Li, and Y. Hou, Action recognition based on joint trajectory maps with convolutional neural networks. Knowledge-Based Systems, vol.158, pp.43-53, 2018.

S. Yeung, O. Russakovsky, G. Mori, and L. Fei-fei, Endto-end learning of action detection from frame glimpses in videos, 2015.

P. Zhang, C. Lan, J. Xing, W. Zeng, J. Xue et al., View adaptive recurrent neural networks for high performance human action recognition from skeleton data, The IEEE International Conference on Computer Vision (ICCV), 2017.

S. Zhang, X. Liu, and J. Xiao, On geometric features for skeleton-based action recognition using multilayer lstm networks, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp.148-157, 2017.

Z. Zhang, C. Wang, B. Xiao, W. Zhou, S. Liu et al., Cross-view action recognition via a continuous virtual path, 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp.2690-2697, 2013.

M. Zolfaghari, G. L. Oliveira, N. Sedaghat, and T. Brox, Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection, 2017 IEEE International Conference on, pp.2923-2932, 2017.