M. E. Abadi, TensorFlow: Large-scale machine learning on heterogeneous systems, 2015.

S. Abu-el-haija, N. Kothari, J. Lee, P. Natsev, G. Toderici et al., Youtube-8m: A large-scale video classification benchmark, 2016.

R. Al-rfou, G. Alain, A. Almahairi, C. Angermueller, D. Bahdanau et al., Theano: A python framework for fast computation of mathematical expressions, 2016.

M. S. Aliakbarian, F. Saleh, B. Fernando, M. Salzmann, L. Petersson et al., Deep action-and context-aware sequence learning for activity recognition and anticipation. arXiv preprint, 2016.

M. R. Amer, S. Todorovic, A. Fern, and S. Zhu, Monte Carlo Tree Search for Scheduling Activity Recognition, 2013 IEEE International Conference on Computer Vision, pp.1353-1360, 2013.
DOI : 10.1109/ICCV.2013.171
URL : http://www.stat.ucla.edu/~sczhu/papers/Conf_2013/MC_scheduling_in_AOG_iccv13.pdf

K. Avgerinakis, K. Adam, A. Briassouli, and Y. Kompatsiaris, Moving camera human activity localization and recognition with motionplanes and multiple homographies, 2015 IEEE International Conference on Image Processing (ICIP), pp.2085-2089, 2015.
DOI : 10.1109/ICIP.2015.7351168

M. Baccouche, F. Mamalet, C. Wolf, C. Garcia, and A. Baskurt, Sequential Deep Learning for Human Action Recognition, HBU, pp.29-39, 2011.
DOI : 10.1007/978-3-642-25446-8_4
URL : https://hal.archives-ouvertes.fr/hal-01354493

I. Bayer and T. Silbermann, A multi modal approach to gesture recognition from audio and video data, Proceedings of the 15th ACM on International conference on multimodal interaction, ICMI '13, pp.461-466, 2013.
DOI : 10.1145/2522848.2532592

Y. Bengio, P. Simard, and P. Frasconi, Learning long-term dependencies with gradient descent is difficult, IEEE Transactions on Neural Networks, vol.5, issue.2, pp.157-166, 1994.
DOI : 10.1109/72.279181

H. Bilen, B. Fernando, E. Gavves, A. Vedaldi, and S. Gould, Dynamic image networks for action recognition, CVPR, 2016.
DOI : 10.1109/cvpr.2016.331

N. C. Camgoz, S. Hadfield, O. Koller, and R. Bowden, Using Convolutional 3D Neural Networks for User-independent continuous gesture recognition, 2016 23rd International Conference on Pattern Recognition (ICPR), 2016.
DOI : 10.1109/ICPR.2016.7899606

C. Cao, Y. Zhang, C. Zhang, and H. Lu, Action recognition with joints-pooled

X. Chai, Z. Liu, F. Yin, Z. Liu, and X. Chen, Two streams Recurrent Neural Networks for Large-Scale Continuous Gesture Recognition, 2016 23rd International Conference on Pattern Recognition (ICPR), 2016.
DOI : 10.1109/ICPR.2016.7899603

R. Chaudhry, F. Ofli, G. Kurillo, R. Bajcsy, and R. Vidal, Bioinspired dynamic 3d discriminative skeletal features for human action recognition, CVPRW, pp.471-478, 2013.
DOI : 10.1109/cvprw.2013.153
URL : http://www.cis.jhu.edu/%7Erizwanch/papers/ChaudhryHAU3D13.pdf

R. Chavarriaga, H. Sagha, J. Del, and R. Milln, Ensemble creation and reconfiguration for activity recognition: An information theoretic approach, 2011 IEEE International Conference on Systems, Man, and Cybernetics, pp.2761-2766, 2011.
DOI : 10.1109/ICSMC.2011.6084090
URL : https://infoscience.epfl.ch/record/166742/files/ITfusion.pdf

C. Chen, B. Zhang, Z. Hou, J. Jiang, M. Liu et al., Action recognition from depth sequences using weighted fusion of 2D and 3D auto-correlation of gradients features, Multimedia Tools and Applications, pp.1-19, 2016.
DOI : 10.1109/TIP.2006.884956

W. Chen and J. J. Corso, Action Detection by Implicit Intentional Motion Clustering, 2015 IEEE International Conference on Computer Vision (ICCV), pp.3298-3306, 2015.
DOI : 10.1109/ICCV.2015.377

G. Chéron, I. Laptev, and C. Schmid, P-CNN: Pose-Based CNN Features for Action Recognition, 2015 IEEE International Conference on Computer Vision (ICCV), p.2015
DOI : 10.1109/ICCV.2015.368

Z. Deng, M. Zhai, L. Chen, Y. Liu, S. Muralidharan et al., Deep structured models for group activity recognition. arXiv preprint, 2015.
DOI : 10.5244/c.29.179
URL : http://arxiv.org/pdf/1506.04191

A. Diba, A. Mohammad-pazandeh, H. Pirsiavash, and L. Van-gool, DeepCAMP: Deep Convolutional Action & Attribute Mid-Level Patterns, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.387
URL : http://arxiv.org/pdf/1608.03217

J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan et al., Long-term recurrent convolutional networks for visual recognition and description, CVPR, pp.2625-2634, 2015.
DOI : 10.1109/cvpr.2015.7298878
URL : http://arxiv.org/pdf/1411.4389

Y. Du, W. Wang, and L. Wang, Hierarchical recurrent neural network for skeleton based action recognition, CVPR, pp.1110-1118, 2015.

J. Duan, S. Zhou, J. Wan, X. Guo, and S. Z. Li, Multi-modality fusion based on consensus-voting and 3d convolution for isolated gesture recognition. arXiv preprint, 2016.

I. C. Duta, B. Ionescu, K. Aizawa, and N. Sebe, Spatio-Temporal VLAD Encoding for Human Action Recognition in Videos, MMM, pp.365-378, 2017.
DOI : 10.1109/ICCV.2013.442

T. Eleni, Gesture recognition with a convolutional long short term memory recurrent neural network, 2015.

J. L. Elman, Finding Structure in Time, Cognitive Science, vol.49, issue.2, pp.179-211, 1990.
DOI : 10.1007/BF00308682

H. J. Escalante, I. Guyon, V. Athitsos, P. Jangyodsuk, and J. Wan, Principal motion components for gesture recognition using a single example, 2015.
DOI : 10.1007/s10044-015-0481-3

H. J. Escalante, E. F. Morales, and L. E. Sucar, A na¨?vena¨?ve bayes baseline for early gesture recognition, pp.91-99, 2016.
DOI : 10.1016/j.patrec.2016.01.013

H. J. Escalante, ChaLearn Joint Contest on Multimedia Challenges Beyond Visual Analysis: An overview, 2016 23rd International Conference on Pattern Recognition (ICPR), 2016.
DOI : 10.1109/ICPR.2016.7899609
URL : https://hal.archives-ouvertes.fr/hal-01381144

V. Escorcia, F. C. Heilbron, J. C. Niebles, and B. Ghanem, DAPs: Deep Action Proposals for Action Understanding, p.2016
DOI : 10.1007/978-3-319-10602-1_26
URL : https://ivul.kaust.edu.sa/Documents/Publications/2016/DAPs%20Deep%20Action%20Proposals%20for%20Action%20Understanding.pdf

C. Feichtenhofer, A. Pinz, and R. Wildes, Spatiotemporal residual networks for video action recognition, NIPS, pp.3468-3476, 2016.

C. Feichtenhofer, A. Pinz, and A. Zisserman, Convolutional Two-Stream Network Fusion for Video Action Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.213
URL : http://arxiv.org/pdf/1604.06573

F. A. Gers, N. N. Schraudolph, and J. Schmidhuber, Learning precise timing with lstm recurrent networks, JMLR, vol.3, pp.115-143, 2002.

G. Gkioxari and J. Malik, Finding action tubes, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
DOI : 10.1109/CVPR.2015.7298676

F. Gu, M. Sridhar, A. Cohn, D. Hogg, F. Flrez-revuelta et al., Weakly supervised activity analysis with spatiotemporal localisation, Neurocomputing, 2016.
DOI : 10.1016/j.neucom.2016.08.032
URL : http://eprints.whiterose.ac.uk/104072/1/weakly-supervised-activity.pdf

S. Han, H. Mao, and W. Dally, Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, Proc. ICLR, 2016.

Z. B. Hao, L. Lu, Q. Zhang, J. Wu, E. Izquierdo et al., Action Recognition based on Subdivision-Fusion Model, Procedings of the British Machine Vision Conference 2015, 2015.
DOI : 10.5244/C.29.50
URL : http://arxiv.org/pdf/1508.04190

F. C. Heilbron, V. Escorcia, B. Ghanem, and J. C. Niebles, ActivityNet: A large-scale video benchmark for human activity understanding, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.961-970, 2015.
DOI : 10.1109/CVPR.2015.7298698
URL : http://repository.kaust.edu.sa/kaust/bitstream/10754/556141/1/ActivityNet_CVPR2015.pdf

S. Hochreiter, Untersuchungen zu dynamischen neuronalen netzen. Diploma, p.91, 1991.

S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Computation, vol.4, issue.8, pp.1735-1780, 1997.
DOI : 10.1016/0893-6080(88)90007-X

J. Huang, W. Zhou, H. Li, and W. Li, Sign Language Recognition using 3D convolutional neural networks, 2015 IEEE International Conference on Multimedia and Expo (ICME), pp.1-6, 2015.
DOI : 10.1109/ICME.2015.7177428

M. Ibrahim, S. Muralidharan, Z. Deng, A. Vahdat, and G. Mori, A Hierarchical Deep Temporal Model for Group Activity Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2016.217
URL : http://arxiv.org/pdf/1511.06040

A. Jain, J. Tompson, Y. Lecun, and C. Bregler, MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation, pp.302-315
DOI : 10.1007/978-3-319-16808-1_21
URL : http://arxiv.org/pdf/1409.7963

M. Jain, J. Van-gemert, and C. G. Snoek, University of amsterdam at thumos challenge 2014, ECCV THUMOS Challenge 2014, 2014.

M. Jain, J. C. Van-gemert, T. Mensink, and C. G. Snoek, Objects2action: Classifying and Localizing Actions without Any Video Example, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.521

M. Jain, J. C. Van-gemert, and C. G. Snoek, What do 15,000 object categories tell us about classifying and localizing actions?, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.46-55, 2015.
DOI : 10.1109/CVPR.2015.7298599

S. Ji, W. Xu, M. Yang, and K. Yu, 3D Convolutional Neural Networks for Human Action Recognition, ICML, pp.495-502, 2010.
DOI : 10.1109/TPAMI.2012.59
URL : http://www.dbs.informatik.uni-muenchen.de/%7Eyu_k/icml2010_3dcnn.pdf

S. Ji, W. Xu, M. Yang, and K. Yu, 3D Convolutional Neural Networks for Human Action Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.1, pp.221-231, 2013.
DOI : 10.1109/TPAMI.2012.59
URL : http://www.dbs.informatik.uni-muenchen.de/%7Eyu_k/icml2010_3dcnn.pdf

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long et al., Caffe, Proceedings of the ACM International Conference on Multimedia, MM '14, pp.675-678, 2014.
DOI : 10.1145/2647868.2654889

S. Karaman, L. Seidenari, A. D. Bagdanov, and A. D. Bimbo, L1- regularized logistic regression stacking and transductive crf smoothing for action recognition in video, ICCV Workshops, 2013.

A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar et al., Large-Scale Video Classification with Convolutional Neural Networks, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.1725-1732, 2014.
DOI : 10.1109/CVPR.2014.223
URL : http://www.cs.cmu.edu/~rahuls/pub/cvpr2014-deepvideo-rahuls.pdf

T. Kerola, N. Inoue, and K. Shinoda, Cross-view human action recognition from depth maps using spectral graph sequences, Computer Vision and Image Understanding, vol.154, pp.108-126, 2017.
DOI : 10.1016/j.cviu.2016.10.004

J. Konecny and M. Hagara, One-shot-learning gesture recognition using hog-hof features, JMLR, vol.15, pp.2513-2532, 2014.

S. Li, W. Zhang, and A. B. Chan, Maximum-margin structured learning with deep networks for 3d human pose estimation, ICCV, pp.2848-2856, 2015.
DOI : 10.1007/s11263-016-0962-x
URL : http://arxiv.org/pdf/1508.06708

Y. Li, W. Li, V. Mahadevan, and N. Vasconcelos, VLAD3: Encoding Dynamics of Deep Features for Action Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1951-1960, 2016.
DOI : 10.1109/CVPR.2016.215

A. Liu, Y. Su, W. Nie, and M. Kankanhalli, Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, issue.1, pp.102-114, 2017.
DOI : 10.1109/TPAMI.2016.2537337

J. Liu, A. Shahroudy, D. Xu, and G. Wang, Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition, ECCV, pp.816-833
DOI : 10.1109/ISSNIP.2014.6827664
URL : http://arxiv.org/pdf/1607.07043

Z. Liu, C. Zhang, and Y. Tian, 3D-based Deep Convolutional Neural Network for action recognition with depth sequences, Image and Vision Computing, vol.55, 2016.
DOI : 10.1016/j.imavis.2016.04.004

J. Luo, W. Wang, and H. Qi, Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps, 2013 IEEE International Conference on Computer Vision, pp.1809-1816, 2013.
DOI : 10.1109/ICCV.2013.227
URL : http://web.eecs.utk.edu/%7Ejluo9/DL-GSGC.pdf

B. Mahasseni and S. Todorovic, Regularizing Long Short Term Memory with 3D Human-Skeleton Sequences for Action Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.333

E. Mansimov, N. Srivastava, and R. Salakhutdinov, Initialization strategies of spatio-temporal convolutional neural networks, 1503.

P. Mettes, J. C. Van-gemert, and C. G. Snoek, Spot On: Action Localization from Pointly-Supervised Proposals, European Conference on Computer Vision, pp.437-453, 2016.
DOI : 10.1007/s11263-013-0636-x

P. Molchanov, S. Gupta, K. Kim, and J. Kautz, Hand gesture recognition with 3D convolutional neural networks, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp.1-7, 2015.
DOI : 10.1109/CVPRW.2015.7301342
URL : http://web4.cs.ucl.ac.uk/staff/j.kautz/publications/Gesture_HANDS15.pdf

P. Molchanov, X. Yang, S. Gupta, K. Kim, S. Tyree et al., Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.456

A. Montes, A. Salvador, X. Giro-i, and . Nieto, Temporal activity detection in untrimmed videos with recurrent neural networks, 2016.

T. B. Escalante and . Moeslund, Deep learning based super-resolution for improved action recognition, IPTA, pp.67-72, 2015.

N. Neverova, C. Wolf, G. Paci, G. Sommavilla, G. W. Taylor et al., A Multi-scale Approach to Gesture Detection and Recognition, 2013 IEEE International Conference on Computer Vision Workshops, pp.484-491, 2013.
DOI : 10.1109/ICCVW.2013.69
URL : https://hal.archives-ouvertes.fr/hal-01339262

N. Neverova, C. Wolf, G. W. Taylor, and F. Nebout, Multi-scale Deep Learning for Gesture Detection and Localization, ECCVW, pp.474-490, 2014.
DOI : 10.1007/978-3-319-16178-5_33
URL : https://hal.archives-ouvertes.fr/hal-01419792

N. Neverova, C. Wolf, G. W. Taylor, and F. Nebout, ModDrop: Adaptive Multi-Modal Gesture Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.38, issue.8, 2015.
DOI : 10.1109/TPAMI.2015.2461544
URL : https://hal.archives-ouvertes.fr/hal-01178733

B. Ni, Y. Pei, Z. Liang, L. Lin, and P. Moulin, Integrating multi-stage depth-induced contextual information for human action recognition and localization, FG, pp.1-8, 2013.

B. Ni, X. Yang, and S. Gao, Progressively Parsing Interactional Objects for Fine Grained Action Detection, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.116

N. Nishida and H. Nakayama, Multimodal Gesture Recognition Using Multi-stream Recurrent Neural Network, In PSIVT, pp.682-694, 2016.
DOI : 10.1007/978-3-319-29451-3_54

S. Oh, A large-scale benchmark dataset for event recognition in surveillance video, CVPR 2011, pp.3153-3160, 2011.
DOI : 10.1109/CVPR.2011.5995586

E. Ohn-bar and M. M. Trivedi, Hand Gesture Recognition in Real Time for Automotive Interfaces: A Multimodal Vision-Based Approach and Evaluations, IEEE Transactions on Intelligent Transportation Systems, vol.15, issue.6, pp.2368-2377, 2014.
DOI : 10.1109/TITS.2014.2337331

D. Oneata, J. Verbeek, and C. Schmid, The LEAR submission at, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01074442

F. J. Ordez and D. Roggen, Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition, Sensors, vol.6, issue.1, p.115, 2016.
DOI : 10.1007/s00779-013-0638-2

W. Ouyang, X. Chu, and X. Wang, Multi-source Deep Learning for Human Pose Estimation, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.2337-2344, 2014.
DOI : 10.1109/CVPR.2014.299

X. Peng and C. Schmid, Encoding feature maps of cnns for action recognition, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01236843

X. Peng and C. Schmid, Multi-region Two-Stream R-CNN for Action Detection, ECCV, pp.744-759, 2016.
DOI : 10.1109/CVPR.2015.7298735
URL : https://hal.archives-ouvertes.fr/hal-01349107

X. Peng, L. Wang, Z. Cai, and Y. Qiao, Action and Gesture Temporal Spotting with Super Vector Representation, pp.518-527
DOI : 10.1007/978-3-319-16178-5_36

X. Peng, L. Wang, Z. Cai, Y. Qiao, and Q. Peng, Hybrid super vector with improved dense trajectories for action recognition, ICCV Workshops, 2013.

X. Peng, C. Zou, Y. Qiao, and Q. Peng, Action Recognition with Stacked Fisher Vectors, ECCV, pp.581-595, 2014.
DOI : 10.1007/978-3-319-10602-1_38

L. Pigou, A. V. Oord, S. Dieleman, M. V. Herreweghe, and J. Dambre, Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video, International Journal of Computer Vision, vol.86, issue.11, 1506.
DOI : 10.1109/CVPR.2015.7298935

Z. Qiu, Q. Li, T. Yao, T. Mei, and Y. Rui, Msr asia msm at thumos challenge 2015, CVPR workshop, 2015.

H. Rahmani and A. Mian, 3D Action Recognition from Novel Viewpoints, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.167

H. Rahmani and A. S. Mian, Learning a non-linear knowledge transfer model for cross-view action recognition, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2458-2466, 2015.
DOI : 10.1109/CVPR.2015.7298860

N. Rhinehart and K. M. Kitani, Learning Action Maps of Large Environments via First-Person Vision, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.69

A. Richard and J. Gall, Temporal Action Detection Using a Statistical Language Model, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.341

H. Sagha, J. Del, R. Milln, and R. Chavarriaga, Detecting anomalies to improve classification performance in opportunistic sensor networks, 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops), pp.154-159, 2011.
DOI : 10.1109/PERCOMW.2011.5766860
URL : http://infoscience.epfl.ch/record/163428/files/persens.pdf

H. Sagha, S. T. Digumarti, J. Del, R. Millán, R. Chavarriaga et al., Benchmarking classification techniques using the Opportunity human activity dataset, 2011 IEEE International Conference on Systems, Man, and Cybernetics, pp.36-40, 2011.
DOI : 10.1109/ICSMC.2011.6083628
URL : https://infoscience.epfl.ch/record/167935/files/IEEESMC2011.pdf

S. Saha, G. Singh, M. Sapienza, P. H. Torr, and F. Cuzzolin, Deep learning for detecting multiple space-time action tubes in videos. arXiv preprint, 2016.

A. Shahroudy, T. Ng, Y. Gong, and G. Wang, Deep multimodal feature analysis for action recognition in rgb+ d videos. arXiv preprint, 2016.

L. Shao, L. Liu, and M. Yu, Kernelized Multiview Projection for Robust Action Recognition, International Journal of Computer Vision, vol.34, issue.3, pp.115-129, 2016.
DOI : 10.1145/1273496.1273646
URL : https://link.springer.com/content/pdf/10.1007%2Fs11263-015-0861-6.pdf

Z. Shou, D. Wang, and S. Chang, Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.119
URL : http://arxiv.org/pdf/1601.02129

Z. Shu, K. Yun, and D. Samaras, Action Detection with Improved Dense Trajectories and Sliding Window, pp.541-551
DOI : 10.1007/978-3-319-16178-5_38
URL : http://www3.cs.stonybrook.edu/%7Ekyun/papers/zhixin_kiwon_chalearnLAP2014.pdf

K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, NIPS, pp.568-576

B. Singh, T. K. Marks, M. Jones, O. Tuzel, and M. Shao, A multistream bi-directional recurrent neural network for fine-grained action detection, CVPR, 2016.
DOI : 10.1109/cvpr.2016.216

S. Singh, C. Arora, and C. V. Jawahar, First Person Action Recognition Using Deep Learned Descriptors, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.287

K. Soomro, H. Idrees, and M. Shah, Action Localization in Videos through Context Walk, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.375

W. Sultani and M. Shah, Automatic action annotation in weakly labeled videos, Computer Vision and Image Understanding, vol.161, 2016.
DOI : 10.1016/j.cviu.2017.05.005

L. Sun, K. Jia, D. Yeung, and B. E. Shi, Human Action Recognition Using Factorized Spatio-Temporal Convolutional Networks, 2015 IEEE International Conference on Computer Vision (ICCV), 1510.
DOI : 10.1109/ICCV.2015.522
URL : http://arxiv.org/pdf/1510.00562

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, Learning Spatiotemporal Features with 3D Convolutional Networks, 2015 IEEE International Conference on Computer Vision (ICCV), pp.4489-4497, 2015.
DOI : 10.1109/ICCV.2015.510
URL : http://arxiv.org/pdf/1412.0767

P. Turaga, A. Veeraraghavan, and R. Chellappa, Statistical analysis on Stiefel and Grassmann manifolds with applications in computer vision, 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2008.
DOI : 10.1109/CVPR.2008.4587733

G. Varol, I. Laptev, and C. Schmid, Long-term Temporal Convolutions for Action Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016.
DOI : 10.1109/TPAMI.2017.2712608
URL : https://hal.archives-ouvertes.fr/hal-01241518

V. Veeriah, N. Zhuang, and G. Qi, Differential Recurrent Neural Networks for Action Recognition, 2015 IEEE International Conference on Computer Vision (ICCV), 1504.
DOI : 10.1109/ICCV.2015.460

C. Vondrick and D. Ramanan, Video annotation and tracking with active learning, NIPS, 2011.

A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. J. Lang, Phoneme recognition using time-delay neural networks. Readings in speech recognition, pp.393-404, 1990.

H. Wang, D. Oneata, J. Verbeek, and C. Schmid, A Robust and Efficient Video Representation for Action Recognition, International Journal of Computer Vision, vol.103, issue.1, pp.1-20, 2015.
DOI : 10.1109/ICCV.2013.442
URL : https://hal.archives-ouvertes.fr/hal-01145834

H. Wang, W. Wang, and L. Wang, How scenes imply actions in realistic videos? In ICIP, pp.1619-1623, 2016.
DOI : 10.1109/icip.2016.7532632

L. Wang, Y. Qiao, and X. Tang, Action recognition and detection by combining motion and appearance features Action recognition with trajectorypooled deep-convolutional descriptors, THUMOS Action Recognition challenge CVPR, pp.1-6, 2014.
DOI : 10.1109/cvpr.2015.7299059
URL : http://arxiv.org/pdf/1505.04868

L. Wang, Y. Qiao, X. Tang, and L. V. , Actionness Estimation Using Hybrid Fully Convolutional Networks, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1604.
DOI : 10.1109/CVPR.2016.296

L. Wang, Z. Wang, Y. Xiong, and Y. Qiao, CUHK&SIAT submission for thumos15 action recognition challenge, THUMOS Action Recognition challenge, pp.1-3, 2015.

P. Wang, W. Li, Z. Gao, J. Zhang, C. Tang et al., Deep convolutional neural networks for action recognition using depth map sequences, 1501.

P. Wang, W. Li, S. Liu, Z. Gao, C. Tang et al., Largescale isolated gesture recognition using convolutional neural networks. arXiv preprint, 2017.
DOI : 10.1109/icpr.2016.7899599
URL : http://arxiv.org/pdf/1701.01814

P. Wang, W. Li, S. Liu, Y. Zhang, Z. Gao et al., Large-scale Continuous Gesture Recognition Using Convolutional Neural Networks, 2016 23rd International Conference on Pattern Recognition (ICPR), 2016.
DOI : 10.1109/ICPR.2016.7899600

Y. Wang and M. Hoai, Improving human action recognition by nonaction classification. CoRR, abs, 1604.
DOI : 10.1109/cvpr.2016.295
URL : http://arxiv.org/pdf/1604.06397

Z. Wang, L. Wang, W. Du, and Y. Qiao, Exploring fisher vector and deep networks for action spotting, CVPRW, pp.10-14, 2015.

P. Weinzaepfel, Z. Harchaoui, and C. Schmid, Learning to Track for Spatio-Temporal Action Localization, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.362
URL : https://hal.archives-ouvertes.fr/hal-01159941

C. Wolf, E. Lombardi, J. Mille, O. Celiktutan, M. Jiu et al., Evaluation of video activity localizations integrating quality and quantity measurements, Computer Vision and Image Understanding, vol.127, pp.14-30, 2014.
DOI : 10.1016/j.cviu.2014.06.014
URL : https://hal.archives-ouvertes.fr/hal-01283866

D. Wu, L. Pigou, P. J. Kindermans, N. Le, L. Shao et al., Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.38, issue.8, pp.1-1, 2016.
DOI : 10.1109/TPAMI.2016.2537340
URL : http://doi.org/10.1109/tpami.2016.2537340

J. Wu, J. Cheng, C. Zhao, and H. Lu, Fusing multi-modal features for gesture recognition, Proceedings of the 15th ACM on International conference on multimodal interaction, ICMI '13, pp.453-460, 2013.
DOI : 10.1145/2522848.2532589

J. Wu, P. Ishwar, and J. Konrad, Two-stream cnns for gesture-based verification and identification: Learning user style, CVPRW, 2016.
DOI : 10.1007/978-3-319-61657-5_7

X. Xu, T. M. Hospedales, and S. Gong, Multi-Task Zero-Shot Action Recognition with Prioritised Data Augmentation, Proc. ECCV, 2016.
DOI : 10.1007/978-3-642-15561-1_11
URL : http://arxiv.org/pdf/1611.08663

Y. Ye and Y. Tian, Embedding Sequential Information into Spatiotemporal Features for Action Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2016.
DOI : 10.1109/CVPRW.2016.142

S. Yeung, O. Russakovsky, G. Mori, and L. Fei-fei, End-to-End Learning of Action Detection from Frame Glimpses in Videos, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1511.
DOI : 10.1109/CVPR.2016.293

D. Yu, A. Eversole, M. Seltzer, K. Yao, Z. Huang et al., An introduction to computational networks and the computational network toolkit, 2014.

J. Yuan, B. Ni, X. Yang, and A. Kassim, Temporal Action Localization with Pyramid of Score Distribution Features, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.337

J. Yue-hei, M. Ng, S. Hausknecht, O. Vijayanarasimhan, R. Vinyals et al., Beyond short snippets: Deep networks for video classification, CVPR, pp.4694-4702, 2015.

B. Zhang, L. Wang, Z. Wang, Y. Qiao, and H. Wang, Real-Time Action Recognition with Enhanced Motion Vector CNNs, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.297
URL : http://arxiv.org/pdf/1604.07669

S. Zhao, Y. Liu, Y. Han, and R. Hong, Pooling the Convolutional Layers in Deep ConvNets for Video Action Recognition, IEEE Transactions on Circuits and Systems for Video Technology, 2015.
DOI : 10.1109/TCSVT.2017.2682196

T. Zhou, N. Li, X. Cheng, Q. Xu, L. Zhou et al., Learning semantic context feature-tree for action recognition via nearest neighbor fusion, Neurocomputing, vol.201, pp.1-11, 2016.
DOI : 10.1016/j.neucom.2016.04.007

Y. Zhou, B. Ni, R. Hong, M. Wang, and Q. Tian, Interaction part mining: A mid-level approach for fine-grained action recognition, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3323-3331, 2015.
DOI : 10.1109/CVPR.2015.7298953
URL : http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Zhou_Interaction_Part_Mining_2015_CVPR_paper.pdf

W. Zhu, J. Hu, G. Sun, X. Cao, and Y. Qiao, A Key Volume Mining Deep Framework for Action Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.219

W. Zhu, C. Lan, J. Xing, W. Zeng, Y. Li et al., Cooccurrence feature learning for skeleton based action recognition using regularized deep lstm networks, 2016.