K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, Advances in Neural Information Processing Systems, pp.568-576, 2014.

J. C. Niebles and L. Fei-fei, A Hierarchical Model of Shape and Appearance for Human Action Classification, 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2007.
DOI : 10.1109/CVPR.2007.383132

J. C. Niebles, H. Wang, and L. Fei-fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, International Journal of Computer Vision, vol.25, issue.25, pp.299-318, 2008.
DOI : 10.1007/s11263-007-0122-4

M. Marszalek, I. Laptev, and C. Schmid, Actions in context, 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp.2929-2936, 2009.
DOI : 10.1109/CVPR.2009.5206557
URL : https://hal.archives-ouvertes.fr/inria-00548645

H. Wang, A. Klaser, C. Schmid, and C. L. Liu, Action recognition by dense trajectories, CVPR 2011, pp.3169-3176, 2011.
DOI : 10.1109/CVPR.2011.5995407
URL : https://hal.archives-ouvertes.fr/inria-00583818

H. Wang and C. Schmid, Action Recognition with Improved Trajectories, 2013 IEEE International Conference on Computer Vision, pp.3551-3558, 2013.
DOI : 10.1109/ICCV.2013.441
URL : https://hal.archives-ouvertes.fr/hal-00873267

S. R. Fanello, I. Gori, G. Metta, and F. Odone, Keep it simple and sparse: Real-time action recognition, Journal of Machine Learning Research, vol.14, pp.2617-2640, 2013.

S. Chandra, S. Tsogkas, and I. Kokkinos, Accurate Human-Limb Segmentation in RGB-D Images for Intelligent Mobility Assistance Robots, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW), pp.44-50, 2015.
DOI : 10.1109/ICCVW.2015.64
URL : https://hal.archives-ouvertes.fr/hal-01263621

E. S. Fotinea, E. Efthimiou, A. L. Dimou, T. Goulas, P. Karioris et al., Data Acquisition towards Defining a Multimodal Interaction Model for Human ??? Assistive Robot Communication, International Conference on Universal Access in Human-Computer Interaction, pp.613-624, 2014.
DOI : 10.1007/978-3-319-07446-7_59

S. Escalera, J. Gonzàlez, X. Baró, M. Reyes, I. Guyon et al., ChaLearn multi-modal gesture recognition 2013, Proceedings of the 15th ACM on International conference on multimodal interaction, ICMI '13, pp.365-368, 2013.
DOI : 10.1145/2522848.2532597

Y. Sun, X. Wang, and X. Tang, Deep Convolutional Network Cascade for Facial Point Detection, 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp.3476-3483, 2013.
DOI : 10.1109/CVPR.2013.446

A. Toshev and C. Szegedy, DeepPose: Human Pose Estimation via Deep Neural Networks, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.1653-1660, 2014.
DOI : 10.1109/CVPR.2014.214

J. Tompson, A. Jain, Y. Lecun, and C. Bregler, Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation, In: NIPS, 2014.

V. Belagiannis and A. Zisserman, Recurrent human pose estimation. arXiv preprint, 2016.

I. Lifshitz, E. Fetaya, and S. Ullman, Human pose estimation using deep consensus voting. arXiv preprint arXiv:1603, p.8212, 2016.
DOI : 10.1007/978-3-319-46475-6_16
URL : http://arxiv.org/abs/1603.08212

A. Haque, B. Peng, Z. Luo, A. Alahi, S. Yeung et al., Viewpoint invariant 3d human pose estimation with recurrent error feedback. arXiv preprint, 2016.
DOI : 10.1007/978-3-319-46448-0_10
URL : http://arxiv.org/abs/1603.07076

V. Ramakrishna, D. Munoz, M. Hebert, J. A. Bagnell, and Y. Sheikh, Pose Machines: Articulated Pose Estimation via Inference Machines, European Conference on Computer Vision, pp.33-47, 2014.
DOI : 10.1007/978-3-319-10605-2_3
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.640.2838

J. Carreira, P. Agrawal, K. Fragkiadaki, and J. Malik, Human pose estimation with iterative error feedback. arXiv preprint arXiv:1507, p.6550, 2015.
DOI : 10.1109/cvpr.2016.512
URL : http://arxiv.org/abs/1507.06550

E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, and B. Schiele, Deepercut: A deeper, stronger, and faster multi-person pose estimation model. arXiv preprint, 2016.
DOI : 10.1007/978-3-319-46466-4_3
URL : http://arxiv.org/abs/1605.03170

S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Computation, vol.4, issue.8, pp.1735-1780, 1997.
DOI : 10.1016/0893-6080(88)90007-X

M. Andriluka, L. Pishchulin, P. Gehler, and B. Schiele, 2D Human Pose Estimation: New Benchmark and State of the Art Analysis, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.471

S. Johnson and M. Everingham, Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation, Procedings of the British Machine Vision Conference 2010, 2010.
DOI : 10.5244/C.24.12
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.175.2192

H. Wang, M. M. Ullah, A. Klser, I. Laptev, and C. Schmid, Evaluation of local spatio-temporal features for action recognition, Procedings of the British Machine Vision Conference 2009, 2009.
DOI : 10.5244/C.23.124
URL : https://hal.archives-ouvertes.fr/inria-00439769

I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, Learning realistic human actions from movies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2008.
DOI : 10.1109/CVPR.2008.4587756
URL : https://hal.archives-ouvertes.fr/inria-00548659

I. Rodomagoulakis, N. Kardaris, V. Pitsikalis, A. Arvanitakis, and P. Maragos, A multimedia gesture dataset for human robot communication: Acquisition, tools and recognition results, 2016 IEEE International Conference on Image Processing (ICIP), 2016.
DOI : 10.1109/ICIP.2016.7532923