B. , P. And-bremond, and F. , Video Covariance Matrix Logarithm for Human Action Recognition in Videos, IJCAI 2015 -24th International Joint Conference on Artificial Intelligence (IJCAI) (Buenos Aires, 2015.

B. , P. Corvee, E. Bak, S. And-bremond, and F. , Relative Dense Tracklets for Human Action Recognition, 10th IEEE International Conference on Automatic Face and Gesture Recognition, pp.1-7, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00806321

B. , P. Koperski, M. Bak, S. And-bremond, and F. , Representing visual appearance by video brownian covariance descriptor for human action recognition
URL : https://hal.archives-ouvertes.fr/hal-01054943

B. , E. And-bjork, and R. , Memory: Handbook of Perception and Cognition, 1996.

B. , A. F. And-davis, and J. W. , The recognition of human movement using temporal templates, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE IN- TELLIGENCE, vol.23, pp.257-267, 2001.

B. , S. Escorcia, V. Shen, C. Ghanem, B. And-niebles et al., SST: Singlestream temporal action proposals, CVPR (2017). (Cited on, p.176

C. , Z. Simon, T. Wei, S. And, and Y. Sheikh, Realtime multi-person 2d pose estimation using part affinity fields, CVPR, pp.14-157, 2017.

C. , Z. Qin, L. Ye, Y. Huang, Q. And-tian et al., Human daily action analysis with multi-view and color-depth data, Proceedings of the 12th International Conference on Computer Vision - ECCV'12, pp.52-61

C. , G. Laptev, I. And-schmid, and C. , Pose-Based CNN Features for Action Recognition, pp.3218-3226
URL : https://hal.archives-ouvertes.fr/hal-01187690

C. , C. And-vapnik, and V. , Support-vector networks, Machine Learning, vol.20, issue.3, pp.273-297, 1995.

M. , F. Crispim-junior, C. F. Koperski, M. Cosar, S. And-bremond et al., Online recognition of daily activities by color-depth sensing and knowledge models. 1528. (Cited on page 181.) [19] Semisupervised understanding of complex activities from temporal concepts, 13th International Conference on Advanced Video and Signal-Based Surveillance, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01658438

D. , N. And-triggs, and B. , Histograms of oriented gradients for human detection, CVPR (2005). (Cited on pages 16, pp.32-73

D. , J. W. And-bobick, and A. F. , The representation and recognition of human movement using temporal templates, Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97) CVPR '97, p.928, 1997.

D. , J. Dong, W. Socher, R. Li, L. Li et al., ImageNet: A Large-Scale Hierarchical Image Database, CVPR09, p.35, 0190.

D. , P. Rabaud, V. Cottrell, G. And-belongie, and S. , Behavior recognition via sparse spatio-temporal features, IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS), p.16, 2005.

D. , T. Thome, N. And-cord, and M. Mantra, Minimum maximum latent structural svm for image classification and ranking, 2015 IEEE International Conference on Computer Vision (ICCV), pp.2713-2721, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01343784

D. , T. Thome, N. Cord, M. And-picard, and D. , Incremental learning of latent structural svm for weakly supervised image classification, 2014 IEEE International Conference on Image Processing (ICIP), pp.4246-4250, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01077058

E. , C. Masood, S. Z. Tappen, M. F. Laviola, J. et al., Exploring the trade-off between accuracy and observational latency in action recognition, Int. J. Comput. Vision, vol.101, issue.3, pp.420-436, 2013.

E. , V. Caba-heilbron, F. Niebles, J. C. And-ghanem, and B. , DAPs: Deep Action Proposals for Action Understanding, pp.768-784, 2016.

F. , C. Pinz, A. And-zisserman, and A. , Convolutional Two-Stream Network Fusion for Video Action Recognition, pp.1933-1941

F. , P. F. Girshick, R. B. Mcallester, D. And-ramanan, and D. , Object detection with discriminatively trained part based models, TPAMI, 2010.

F. , P. F. Girshick, R. B. Mcallester, D. And-ramanan, and D. , Object detection with discriminatively trained part-based models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.9, pp.1627-1645, 2010.

F. , B. Gavves, E. Oramas, J. M. Ghodrati, A. And-tuytelaars et al., Modeling Video Evolution for Action Recognition, pp.5378-5387

G. , L. Blank, M. Shechtman, E. Irani, M. And-basri et al., Actions as space-time shapes, IEEE Trans. Pattern Anal. Mach. Intell, vol.29, issue.15, pp.2247-2253, 2007.

G. , M. A. Torki, M. Hussein, M. E. And-el-saban, and M. , Histogram of oriented displacements (hod): Describing trajectories of human joints for action recognition, Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence IJCAI '13, pp.1351-1357, 2013.

H. , A. A. Dartigues-pallez, C. Precioso, F. Riveill, M. Benslimane et al., Human action recognition based on 3d skeleton part-based pose estimation and temporal multi-resolution analysis, 2016 IEEE International Conference on Image Processing (ICIP), pp.3041-3045, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01484111

H. , C. And, and M. Stephens, A combined corner and edge detector, Proc. of Fourth Alvey Vision Conference, pp.147-151, 1988.

H. , F. C. Niebles, J. C. And-ghanem, and B. , Fast temporal activity proposals for efficient detection of human actions in untrimmed videos, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1914-1923, 2016.

H. , J. Zheng, W. Lai, J. And-zhang, and J. , Jointly learning heterogeneous features for RGB-D activity recognition, CVPR, pp.167-171, 2015.

H. , M. E. Torki, M. Gowayyed, M. A. And-el-saban, and M. , Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations, Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence IJCAI '13, pp.2466-2472, 2013.

I. , E. Pishchulin, L. Andres, B. Andriluka, M. And-schieke et al., Deepercut: A deeper, stronger, and faster multi-person pose estimation model

J. , S. Xu, W. Yang, M. And, Y. et al., 3d convolutional neural networks for human action recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.1, pp.221-231, 2013.

K. , M. Bilinski, P. And-bremond, and F. , 3d trajectories for action recognition, 2014 IEEE International Conference on Image Processing (ICIP), pp.4176-4180, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01054949

K. , M. And-bremond, and F. , Modeling Spatial Layout of Features for Real World Scenario RGB-D Action Recognition, AVSS 2016, pp.44-50, 2016.

K. , H. And, and A. Saxena, Learning spatio-temporal structure from rgb-d videos for human activity detection and anticipation, ICML, p.168, 2013.

K. , H. S. Gupta, R. And, and A. Saxena, Learning human activities and object affordances from rgb-d videos, Int. J. Rob. Res, vol.32, issue.167, pp.951-970, 2013.

K. , J. Verbeek, J. And-jurie, and F. , Modeling Spatial Layout with Fisher Vectors for Image Categorization, ICCV, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00612277

K. , A. Sutskever, I. And-hinton, and G. , Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, pp.1097-1105, 2012.

K. , H. Jhuang, H. Garrote, E. Poggio, T. And-serre et al., HMDB: a large video database for human motion recognition, Proceedings of the International Conference on Computer Vision (ICCV), 2011.

L. , Z. Lin, M. Li, X. Hauptmann, A. G. And-raj et al., Beyond Gaussian Pyramid: Multi-Skip Feature Stacking for Action Recognition, pp.204-212

L. , I. Marszalek, M. Schmid, C. And-rozenfeld, and B. , Learning realistic human actions from movies, IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 0118.
URL : https://hal.archives-ouvertes.fr/inria-00548659

L. , S. Schmid, C. And-ponce, and J. , Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, CVPR, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00548585

L. , W. Zhang, Z. And-liu, and Z. , Action Recognition Based on a Bag of 3D Points, 2010.

L. , L. Wang, K. Zuo, W. Wang, M. Luo et al., A deep structured model with radius?margin bound for 3d human activity recognition, International Journal of Computer Vision, vol.118, issue.169, pp.256-273, 2016.

L. , L. And, and L. Shao, Learning discriminative representations from rgb-d video data, IJCAI, 2013.

L. , T. Wang, X. Dai, X. And-luo, and J. , Deep recursive and hierarchical conditional random fields for human action recognition, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp.1-9, 2016.

L. , W. Anguelov, D. Erhan, D. Szegedy, C. Reed et al., Single shot multibox detector, ECCV, p.125, 2016.

D. G. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, vol.60, issue.2
DOI : 10.1023/B:VISI.0000029664.99615.94
URL : http://www.cs.ubc.ca/~lowe/papers/ijcv03.ps

L. , C. Jia, J. And-tang, and C. , Range-sample depth feature for action recognition

L. , B. D. And-kanade, and T. , An iterative image registration technique with an application to stereo vision, Proceedings of the 7th International Joint Conference on Artificial Intelligence - IJCAI'81, pp.674-679, 1981.

L. , F. And-nevatia, and R. , Recognition and segmentation of 3-d human action using hmm and multi-class adaboost, Proceedings of the 9th European Conference on Computer Vision -Volume Part IV ECCV'06, pp.359-372, 2006.

M. , S. Sigal, L. And-sclaroff, and S. , Space-time tree ensemble for action recognition, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.5024-5032, 2015.

M. , S. Sigal, L. And-sclaroff, and S. , Learning activity progression in lstms for activity detection and early detection, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1942-1950, 2016.

M. , R. Zelnik-manor, L. And-tal, and A. , Otc: A novel local descriptor for scene classification, ECCV, 2014.

M. , P. Hebert, M. And-sukthankar, and R. , Trajectons: Action recognition through the motion analysis of tracked features, IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, pp.514-521, 2009.

M. , R. Pal, C. And-kautz, and H. , Activity recognition using the velocity histories of tracked keypoints, IEEE 12th International Conference on Computer Vision, pp.104-111, 2009.

M. , R. Thome, N. Cord, M. Leite, N. J. And-stolfi et al., T-hog: An effective gradient-based descriptor for single line text regions. Pattern Recogn, pp.1078-1090, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01185468

M. , T. B. Hilton, A. And-krüger, and V. , A survey of advances in visionbased human motion capture and analysis, Comput. Vis. Image Underst, vol.104, issue.2

N. , F. Cogar, S. Bremond, F. And-koperski, and M. , Generating unsupervised models for online long-term daily living activity recognition, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), pp.186-190, 2015.

N. , F. F. Cosar, S. Koperski, M. F. Crispim-junior, C. F. Avgerinakis et al., A hybrid framework for online recognition of activities of daily living in real-world settings, 13th IEEE International Conference on Advanced Video and Signal Based Surveillance -AVSS 2016, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01384710

N. , B. Moulin, P. And, Y. , and S. , Order-preserving sparse coding for sequence classification, ECCV, 2012.

N. , B. Wang, G. And-moulin, and P. , Rgbd-hudaact: A color-depth video database for human daily activity recognition, IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp.1147-1153, 2011.

O. , F. Chaudhry, R. Kurillo, G. Vidal, R. And-bajcsy et al., Sequence of the most informative joints (smij): A new representation for human skeletal action recognition, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp.8-13, 2012.

O. , E. And-trivedi, and M. M. , Joint angles similarities and hog2 for action recognition, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp.465-470, 2013.

O. , A. Patras, I. And-pantic, and M. , Spatiotemporal salient points for visual recognition of human actions, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol.36, issue.3, pp.710-719, 2005.

O. , O. And-liu, and Z. , Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences, 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp.716-723, 2013.

O. , J. Bak, S. Koperski, M. And-brémond, and F. , Minimizing hallucination in Histogram of Oriented Gradients, The 12th IEEE International Conference on Advanced Video and Signal-based, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01199386

. Scikit-learn, Machine learning in Python, Journal of Machine Learning Research, vol.12, pp.2825-2830, 2011.

P. , F. Liu, Y. Sanchez, J. And-poirier, and H. , Large-Scale Image Retrieval with Compressed Fisher Vectors, CVPR, pp.33-34, 2010.

P. , F. Sanchez, J. And-mensink, and T. , Improving the Fisher Kernel for Large-Scale image classification, ECCV, pp.33-78, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00548630

P. , L. Insafutdinov, E. Tang, S. Andres, B. Andriluka et al., Deepcut: Joint subset partition and labeling for multi person pose estimation, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.14, 2016.

P. , F. Tuzel, O. And-meer, and P. , Covariance tracking using model update based means on riemannian manifolds, CVPR, 2006.

Q. , H. Mao, Y. Xiang, W. And-wang, and Z. , Recognition of human activities using svm multi-class classifier, Pattern Recogn. Lett, vol.31, issue.2, pp.100-111, 2010.

R. , H. Mahmood, A. , Q. Huynh, D. And-mian et al., HOPC: Histogram of Oriented Principal Components of 3D Pointclouds for Action Recognition, pp.742-757

R. , A. S. Azizpour, H. Sullivan, J. And-carlsson, and S. , Cnn features offthe-shelf: An astounding baseline for recognition, Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp.512-519

R. , S. He, K. Girshick, R. And, and J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, Advances in Neural Information Processing Systems, pp.91-99, 2015.

R. , L. Schauerte, B. Al-halah, Z. And-stiefelhagen, and R. , Important stuff, everywhere! activity recognition with salient proto-objects as context, In WACV, 2014.

S. , P. Ali, S. And, and M. Shah, A 3-dimensional sift descriptor and its application to action recognition, Proceedings of the 15th ACM International Conference on Multimedia MM '07, pp.357-360, 2007.

S. , L. Varano, V. Berretti, S. , D. Bimbo et al., Recognizing actions from depth cameras as weakly aligned multi-part bag-of-poses, In CVPRW, vol.171, p.170, 2013.

S. , S. Maulidevi, N. U. And-aryan, and P. , Human action recognition using dynamic time warping, Proceedings of the 2011 International Conference on Electrical Engineering and Informatics, pp.1-5, 2011.

S. , A. Liu, J. Ng, T. And-wang, and G. , Ntu rgb+d: A large scale dataset for 3d human activity analysis, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

S. , A. Wang, G. And-ng, and T. , Multi-modal feature fusion for action recognition in rgb-d sequences, ISCCSP, 2014.

S. , J. Sharp, T. Kipman, A. Fitzgibbon, A. Finocchio et al., Real-time human pose recognition in parts from single depth images, Commun. ACM, vol.56, issue.21, pp.116-124, 2013.

S. , Z. Wang, D. And, C. , and S. , Temporal action localization in untrimmed videos via multi-stage cnns, CVPR (2016). (Cited on, p.176

S. , K. And-zisserman, and A. , Two-stream convolutional networks for action recognition in videos, Advances in Neural Information Processing Systems, pp.568-576, 2014.

K. Simonyan and A. And-zisserman, Two-stream convolutional networks for action recognition in videos, Proceedings of the 27th International Conference on Neural Information Processing Systems, pp.568-576

S. , K. And-zisserman, and A. , Very deep convolutional networks for largescale image recognition. arXiv preprint, 2014.

S. , K. Zamir, A. R. And, and M. Shah, Ucf101: A dataset of 101 human actions classes from videos in the wild, 2012.

S. , L. And-arras, and K. , People detection in rgb-d data, IROS, 2011.

S. , D. Koperski, M. Bremond, F. And-francesca, and G. , Action recognition based on a mixture of rgb and depth based skeleton, 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp.228-234, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01639504

S. , J. Wu, X. Yan, S. Cheong, L. F. Chua et al., Hierarchical spatio-temporal context modeling for action recognition, 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp.2004-2011, 2009.

S. , J. Ponce, C. Selman, B. And, and A. Saxena, Unstructured human activity detection from rgbd images, ICRA (2012). (Cited on pages 10, pp.24-167

S. , G. J. And-rizzo, and M. L. , Brownian distance covariance, The Annals of Applied Statistics, vol.3, issue.4, pp.1236-1265, 2009.

T. , P. Steinbach, M. And-kumar, and V. , Introduction to Data Mining, p.34, 2005.

A. W. Vieira, E. R. Nascimento, G. L. Oliveira, Z. Liu, and M. F. And-campos, STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences
DOI : 10.1007/978-3-642-33275-3_31

V. , C. Khosla, A. Malisiewicz, T. And-torralba, and A. , Hoggles: Visualizing object detection features, ICCV, 2013.

W. , H. Klaser, A. Schmid, C. And-liu, and C. L. , Action recognition by dense trajectories, CVPR 2011, pp.3169-3176, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00583818

W. , H. And-schmid, and C. , Action recognition with improved trajectories, 2013 IEEE International Conference on Computer Vision, pp.3551-3558, 2013.

W. , J. Liu, Z. And-wu, and Y. , Random Occupancy Patterns, pp.41-55

W. , J. Liu, Z. Wu, Y. And-yuan, and J. , Mining actionlet ensemble for action recognition with depth cameras, 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.1290-1297, 2012.

W. , L. Qiao, Y. And-tang, and X. , Action recognition with trajectory-pooled deepconvolutional descriptors, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4305-4314, 2015.

W. , L. Xiong, Y. Wang, Z. And-qiao, and Y. , Towards Good Practices for Very Deep Two-Stream ConvNets, 2015.

W. , L. Xiong, Y. Wang, Z. Qiao, Y. Lin et al., Temporal Segment Networks: Towards Good Practices for Deep Action Recognition, 2016.

W. , P. Li, W. Gao, Z. Tang, C. Zhang et al., ConvNets- Based Action Recognition from Depth Maps Through Virtual Cameras and Pseudocoloring, Proceedings of the 23rd ACM International Conference on Multimedia MM '15, pp.1119-1122

W. , P. Li, W. Gao, Z. Zhang, J. Tang et al., Action recognition from depth maps using deep convolutional neural networks, IEEE Transactions on Human-Machine Systems, vol.46, issue.4, pp.498-509, 2016.

W. , X. Thome, N. And-cord, and M. , Gaze latent support vector machine for image classification, 2016 IEEE International Conference on Image Processing (ICIP), pp.236-240, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01342580

W. , S. Ramakrishna, V. Kanade, T. And, and Y. Sheikh, Convolutional pose machines, CVPR, p.14, 2016.

W. , G. Tuytelaars, T. And, and L. Van-gool, An Efficient Dense and Scale- Invariant Spatio-Temporal Interest Point Detector, pp.650-663, 2008.

W. , S. F. And-cipolla, and R. , Extracting spatiotemporal interest points using global information, IEEE 11th International Conference on Computer Vision, pp.1-8, 2007.

W. , Z. Wang, X. Jiang, Y. Ye, H. And-xue et al., Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification, Proceedings of the 23rd ACM International Conference on Multimedia MM '15, pp.461-470

X. , L. And-aggarwal, and J. K. , Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera, Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition CVPR '13, pp.2834-2841

X. , L. Chen, C. And-aggarwal, and J. K. , View invariant human action recognition using histograms of 3d joints, CVPR Workshops, pp.20-27, 2012.

X. , B. Fu, Y. Jiang, Y. Li, B. And-sigal et al., Video emotion recognition with transferred deep feature encodings, Proceedings of the 2016 ACM on International Bibliography Conference on Multimedia Retrieval ICMR '16, pp.15-22

Y. , X. And-tian, and Y. , Eigenjoints-based action recognition using naÃ¯ve-bayesnearest-neighbor, CVPR Workshops, pp.14-19, 2012.

Y. , X. And-tian, and Y. , Super normal vector for activity recognition using depth sequences, CVPR, 2014.

Y. , X. And-tian, and Y. , Super normal vector for human activity recognition with depth cameras, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, issue.5, pp.1028-1039, 2017.

Y. , X. Zhang, C. And-tian, and Y. , Recognizing actions using depth motion maps-based histograms of oriented gradients, Proceedings of the 20th ACM International Conference on Multimedia MM '12, pp.1057-1060

Y. , E. And-aggarwal, and J. , Human action recognition with extremities as semantic posture representation, pp.1-8, 2009.

Z. , S. Bilinski, P. And-bremond, and F. , Towards Unsupervised Sudden Group Movement Discovery for Video Surveillance, VISAPP -9th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications -2014, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00878580

Z. , S. Luisier, F. Andrews, W. Srivastava, N. And-salakhutdinov et al., Exploiting Image-trained CNN Architectures for Unconstrained Video Classification, 2015.

Z. , J. Marsza?ek, M. Lazebnik, S. And-schmid, and C. , Local features and kernels for classification of texture and object categories: A comprehensive study, International Journal of Computer Vision, vol.73, issue.2, pp.213-238, 2007.

Z. , Y. Liu, Z. , Y. , L. And-cheng et al., Combing rgb and depth map features for human activity recognition, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, pp.1-4, 2012.

P. Zhu, W. Hu, L. Li, and Q. And-wei, Human Activity Recognition Based on $\Re$ Transform and Fourier Mellin Transform, pp.631-640, 2009.
DOI : 10.1007/978-3-642-10520-3_60

Z. , Y. Chen, W. And-guo, and G. , Evaluating spatiotemporal interest point features for depth-based action recognition, Image and Vision Computing, vol.32, issue.8, pp.453-464, 2014.