A. Abdulnabi, B. Shuai, S. Winkler, and G. Wang, Episodic CAMN: Contextual Attention-Based Memory Networks with Iterative Feedback for Scene Labeling, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.665

J. Ba, V. Mnih, and K. Kavukcuoglu, Multiple object recognition with visual attention, ICLR, 2015.

M. Baccouche, F. Mamalet, C. Wolf, C. Garcia, and A. Baskurt, Sequential Deep Learning for Human Action Recognition, HBU, 2011.
DOI : 10.1007/978-3-642-25446-8_4
URL : https://hal.archives-ouvertes.fr/hal-01354493

D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate, ICLR, 2015.

F. Baradel, C. Wolf, and J. Mille, Human Action Recognition: Pose-Based Attention Draws Focus to Hands, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), 2017.
DOI : 10.1109/ICCVW.2017.77
URL : https://hal.archives-ouvertes.fr/hal-01575390

F. Baradel, C. Wolf, and J. Mille, Pose-conditioned spatiotemporal attention for human action recognition, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01593548

M. Bellver, X. Giro-i-nieto, F. Marques, and J. Torres, Hierarchical object detection with deep reinforcement learning, Deep Reinforcement Learning Workshop, NIPS, 2003.

J. Carreira and A. Zisserman, Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.502
URL : http://arxiv.org/pdf/1705.07750

K. Cho, A. Courville, and Y. Bengio, Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks, IEEE Transactions on Multimedia, vol.17, issue.11, pp.1875-1886, 2015.
DOI : 10.1109/TMM.2015.2477044
URL : http://arxiv.org/pdf/1507.01053

K. Cho, B. Van-merrienboer, D. Bahdanau, and Y. Bengio, On the properties of neural machine translation: Encoder- Decoder approaches. arXiv preprint, 2014.
DOI : 10.3115/v1/w14-4012
URL : https://doi.org/10.3115/v1/w14-4012

Y. Du, W. Wang, and L. Wang, Hierarchical recurrent neural network for skeleton based action recognition, CVPR, 2007.

Y. Du, W. Wang, and L. Wang, Hierarchical recurrent neural network for skeleton based action recognition, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007.

G. Evangelidis, G. Singh, and R. Horaud, Skeletal quads:human action recognition using joint quadruples, ICPR, pp.4513-4518, 2014.
DOI : 10.1109/icpr.2014.772
URL : https://hal.archives-ouvertes.fr/hal-00989725

A. Graves, G. Wayne, and I. Danihelka, Neural turing machines, 2003.

K. Gregor, I. Danihelka, A. Graves, D. Rezende, and D. Wierstra, DRAW: A recurrent neural network for image generation, ICML, 2015.

A. Gupta, J. Martinez, L. J. , and W. R. , 3D Pose from Motion for Cross-View Action Recognition via Non-linear Circulant Temporal Encoding, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.333
URL : http://www.cs.ubc.ca/%7Ejulm/papers/ankur_cvpr_14.pdf

K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2016.90
URL : http://arxiv.org/pdf/1512.03385

S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Computation, vol.4, issue.8, pp.1735-1780, 1997.
DOI : 10.1016/0893-6080(88)90007-X

J. Hu, W. Zheng, J. Lai, and J. Zhang, Jointly learning heterogeneous features for rgb-d activity recognition, CVPR, pp.5344-5352, 2015.
DOI : 10.1109/tpami.2016.2640292
URL : http://discovery.dundee.ac.uk/ws/files/11155200/PAMI_2017_JZhang.pdf

M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, Spatial transformer networks. In NIPS, pp.2017-2025, 2015.

A. Jain, A. R. Zamir, S. Savarese, and A. Saxena, Structural-RNN: Deep Learning on Spatio-Temporal Graphs, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.573

S. Ji, W. Xu, M. Yang, and K. Yu, 3D Convolutional Neural Networks for Human Action Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.1, pp.221-231, 2013.
DOI : 10.1109/TPAMI.2012.59

Z. Jie, X. Liang, J. Feng, X. Jin, W. Lu et al., Treestructured reinforcement learning for sequential object localization, Advances in Neural Information Processing Systems 29, pp.127-135, 2016.

Q. Ke, M. Bennamoun, S. An, F. Sohel, and F. Boussaid, A New Representation of Skeleton Sequences for 3D Action Recognition, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007.
DOI : 10.1109/CVPR.2017.486

Y. Kim, C. Denton, L. Hoang, and A. Rush, Structured attention networks ICLR, p.2017

D. Kingma and J. Ba, Adam: A method for stochastic optimization, ICML, 2015.

J. Kuen, Z. Wang, and G. Wang, Recurrent Attentional Networks for Saliency Detection, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3668-3677, 2015.
DOI : 10.1109/CVPR.2016.399

A. Kumar, O. Irsoy, J. Su, R. Bradbury, R. English et al., Ask me anything: Dynamic memory networks for natural language processing, ICML, 2016.

H. Larochelle and G. Hinton, Learning to combine foveal glimpses with a third-order Boltzmann machine, NIPS, pp.1243-1251, 2010.

I. Lee, D. Kim, S. Kang, and S. Lee, Ensemble Deep Learning for Skeleton-Based Action Recognition Using Temporal Sliding LSTM Networks, 2017 IEEE International Conference on Computer Vision (ICCV), 2007.
DOI : 10.1109/ICCV.2017.115

B. Li, O. Camps, and M. Sznaier, Cross-view activity recognition using hankelets, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.

R. Li and T. Zickler, Discriminative virtual views for crossview action recognition, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.

Z. Li, K. Gavrilyuk, E. Gavves, M. Jain, and C. Snoek, VideoLSTM convolves, attends and flows for action recognition, Computer Vision and Image Understanding, vol.166, issue.3, 2017.
DOI : 10.1016/j.cviu.2017.10.011

J. Liu, A. Shahroudy, D. Xu, and G. Wang, Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition, ECCV, pp.816-833, 2016.
DOI : 10.1109/ISSNIP.2014.6827664

J. Liu, G. Wang, P. Hu, L. Duan, and A. Kot, Global Context-Aware Attention LSTM Networks for 3D Action Recognition, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.7, 2017.
DOI : 10.1109/CVPR.2017.391

M. Liu, H. Liu, and C. Chen, Enhanced skeleton visualization for view invariant human action recognition, Pattern Recognition, vol.68, issue.7, pp.346-362, 2017.
DOI : 10.1016/j.patcog.2017.02.030

D. Luvizon, D. Picard, and H. Tabia, 2d/3d pose estimation and action recognition using multitask deep learning, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2002.
URL : https://hal.archives-ouvertes.fr/hal-01815703

S. Mathe, A. Pirinen, and C. Sminchisescu, Reinforcement Learning for Visual Object Detection, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2003.
DOI : 10.1109/CVPR.2016.316

V. Mnih, N. Heess, A. Graves, and K. Kavukcuoglu, Recurrent models of visual attention, NIPS, pp.2204-2212, 2014.

P. Molchanov, X. Yang, S. Gupta, K. Kim, S. Tyree et al., Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4207-4215, 2016.
DOI : 10.1109/CVPR.2016.456

N. Neverova, C. Wolf, G. Taylor, and F. Nebout, ModDrop: Adaptive Multi-Modal Gesture Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.38, issue.8, pp.1692-1706, 2016.
DOI : 10.1109/TPAMI.2015.2461544
URL : https://hal.archives-ouvertes.fr/hal-01178733

H. Rahmani and A. Mian, Learning a non-linear knowledge transfer model for cross-view action recognition, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7298860

H. Rahmani and A. Mian, 3D Action Recognition from Novel Viewpoints, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.167

A. Shahroudy, J. Liu, T. Ng, and G. Wang, NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1010-1019, 2007.
DOI : 10.1109/CVPR.2016.115

A. Shahroudy, T. Ng, Y. Gong, and G. Wang, Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.40, issue.5
DOI : 10.1109/TPAMI.2017.2691321
URL : http://arxiv.org/pdf/1603.07120

S. Sharma, R. Kiros, and R. Salakhutdinov, Action recognition using visual attention, 2016.

K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, Advances in Neural Information Processing Systems, pp.568-576, 2014.

S. Song, C. Lan, J. Xing, W. Zeng, and J. Liu, An Endto-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data, AAAI Conf. on AI, p.7, 2016.

S. Sukhbaatar, A. Szlam, J. Weston, and R. Fergus, Endto-end memory networks, NIPS, pp.2440-2448, 2015.

L. Sun, K. Jia, K. Chen, D. Yeung, B. Shi et al., Lattice Long Short-Term Memory for Human Action Recognition, 2017 IEEE International Conference on Computer Vision (ICCV), 2017.
DOI : 10.1109/ICCV.2017.236
URL : http://arxiv.org/pdf/1708.03958

L. Sun, K. Jia, K. Chen, D. Yeung, B. E. Shi et al., Lattice Long Short-Term Memory for Human Action Recognition, 2017 IEEE International Conference on Computer Vision (ICCV), 2017.
DOI : 10.1109/ICCV.2017.236
URL : http://arxiv.org/pdf/1708.03958

L. Tao and R. Vidal, Moving Poselets: A Discriminative and Interpretable Skeletal Motion Representation for Action Recognition, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW), pp.303-311, 2015.
DOI : 10.1109/ICCVW.2015.48

P. Tokmakov, K. Alahari, and C. Schmid, Learning Video Object Segmentation with Visual Memory, 2017 IEEE International Conference on Computer Vision (ICCV), 2017.
DOI : 10.1109/ICCV.2017.480
URL : https://hal.archives-ouvertes.fr/hal-01511145

R. Vemulapalli, F. Arrate, and R. Chellappa, Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.588-595, 2014.
DOI : 10.1109/CVPR.2014.82

J. Wang, N. Xiaohan, X. Yin, W. Ying, and Z. Song-chun, Cross-View Action Modeling, Learning, and Recognition, 2014 IEEE Conference on Computer Vision and Pattern Recognition, p.7, 2014.
DOI : 10.1109/CVPR.2014.339
URL : http://arxiv.org/pdf/1405.2941

P. Wang, W. Li, C. Li, and Y. Hou, Action Recognition Based on Joint Trajectory Maps with Convolutional Neural Networks, ACM Conference on Multimedia, p.7, 2016.
DOI : 10.1016/j.knosys.2018.05.029
URL : http://arxiv.org/pdf/1612.09401

D. Wu, L. Pigou, P. Kindermans, N. D. Le, L. Shao et al., Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.38, issue.8, pp.1583-1597, 2016.
DOI : 10.1109/TPAMI.2016.2537340
URL : http://doi.org/10.1109/tpami.2016.2537340

K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville et al., Show, attend and tell: Neural image caption generation with visual attention, ICML, pp.2048-2057, 2015.

S. Yeung, O. Russakovsky, N. Jin, M. Andriluka, G. Mori et al., Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos, International Journal of Computer Vision, vol.25, issue.1, 2015.
DOI : 10.1109/CVPR.1992.223161
URL : http://arxiv.org/pdf/1507.05738

S. Yeung, O. Russakovsky, G. Mori, and L. Fei-fei, End-toend Learning of Action Detection from Frame Glimpses in Videos, CVPR, 2016.
DOI : 10.1109/cvpr.2016.293
URL : http://arxiv.org/pdf/1511.06984

K. Yun, J. Honorio, D. Chattopadhyay, T. L. Berg, and D. Samaras, Two-person interaction detection using bodypose features and multiple instance learning, CVPR Workshop, pp.28-35, 2012.
DOI : 10.1109/cvprw.2012.6239234
URL : http://www.cs.sunysb.edu/%7Eial/content/papers/2012/kiwon_hau3d12.pdf

M. Zanfir, M. Leordeanu, and C. Sminchisescu, The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection, 2013 IEEE International Conference on Computer Vision, pp.2752-2759, 2013.
DOI : 10.1109/ICCV.2013.342

P. Zhang, C. Lan, J. Xing, W. Zeng, J. Xue et al., View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data, 2017 IEEE International Conference on Computer Vision (ICCV), 2017.
DOI : 10.1109/ICCV.2017.233
URL : http://arxiv.org/pdf/1703.08274

Z. Zhang, C. Wang, B. Xiao, W. Zhou, S. Liu et al., Cross-View Action Recognition via a Continuous Virtual Path, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.347
URL : http://www.cv-foundation.org/openaccess/content_cvpr_2013/papers/Zhang_Cross-View_Action_Recognition_2013_CVPR_paper.pdf