H. Bay, A. Ess, T. Tuytelaars, and L. Van-gool, Speeded-up robust features (SURF), Comput. Vis. Image Underst, pp.346-359, 2008.
DOI : 10.1016/j.cviu.2007.09.014

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.205.738

T. Berg and P. N. Belhumeur, POOF: Part-Based One-vs.-One Features for Fine-Grained Categorization, Face Verification, and Attribute Estimation, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.128

T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, High Accuracy Optical Flow Estimation Based on a Theory for Warping, ECCV, 2004.
DOI : 10.1007/978-3-540-24673-2_3

K. Chatfield, V. Lempitsky, A. Vedaldi, and A. Zisserman, The devil is in the details: an evaluation of recent feature encoding methods, Procedings of the British Machine Vision Conference 2011, 2011.
DOI : 10.5244/C.25.76

K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman, Return of the Devil in the Details: Delving Deep into Convolutional Nets, Proceedings of the British Machine Vision Conference 2014, 2014.
DOI : 10.5244/C.28.6

X. Chen and A. Yuille, Articulated pose estimation by a graphical model with image dependent pairwise relations, NIPS, 2014.

A. Cherian, J. Mairal, K. Alahari, and C. Schmid, Mixing Body-Part Sequences for Human Pose Estimation, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.302

URL : https://hal.archives-ouvertes.fr/hal-00978643

N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005.
DOI : 10.1109/CVPR.2005.177

URL : https://hal.archives-ouvertes.fr/inria-00548512

N. Dalal, B. Triggs, and C. Schmid, Human Detection Using Oriented Histograms of Flow and Appearance, ECCV, 2006.
DOI : 10.1023/A:1008162616689

URL : https://hal.archives-ouvertes.fr/inria-00548587

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., Imagenet: A large-scale hierarchical image database, CVPR, 2009.

J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan et al., Long-term recurrent convolutional networks for visual recognition and description, CVPR, 2015.

M. Douze and H. Jégou, The Yael Library, Proceedings of the ACM International Conference on Multimedia, MM '14, 2014.
DOI : 10.1145/2647868.2654892

URL : https://hal.archives-ouvertes.fr/hal-01020695

K. Duan, D. Parikh, D. Crandall, and K. Grauman, Discovering localized attributes for fine-grained recognition, CVPR, 2012.

G. Farnebäck, Two-Frame Motion Estimation Based on Polynomial Expansion, SCIA, 2003.
DOI : 10.1007/3-540-45103-X_50

M. A. Fischler and R. C. Bolles, Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography, Commun. ACM, 1981.

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.81

G. Gkioxari and J. Malik, Finding action tubes, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7298676

H. Jhuang, J. Gall, S. Zuffi, C. Schmid, and M. J. Black, Towards Understanding Action Recognition, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.396

URL : https://hal.archives-ouvertes.fr/hal-00906902

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, 2012.

H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, HMDB: A large video database for human motion recognition, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126543

I. Laptev, M. Marsza?ek, C. Schmid, and B. Rozenfeld, Learning realistic human actions from movies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587756

URL : https://hal.archives-ouvertes.fr/inria-00548659

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradientbased learning applied to document recognition, Proceedings of the IEEE, 1998.

B. Ni, V. R. Paramathayalan, and P. Moulin, Multiple Granularity Analysis for Fine-Grained Action Detection, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.102

D. Oneata, J. Verbeek, and C. Schmid, Action and Event Recognition with Fisher Vectors on a Compact Feature Set, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.228

URL : https://hal.archives-ouvertes.fr/hal-00873662

M. Oquab, L. Bottou, I. Laptev, and J. Sivic, Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.222

URL : https://hal.archives-ouvertes.fr/hal-00911179

F. Perronnin, J. Sánchez, and T. Mensink, Improving the Fisher Kernel for Large-Scale Image Classification, ECCV, 2010.
DOI : 10.1007/978-3-642-15561-1_11

URL : https://hal.archives-ouvertes.fr/inria-00548630

L. Pishchulin, M. Andriluka, P. Gehler, and B. Schiele, Poselet Conditioned Pictorial Structures, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.82

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.671.6521

M. Rohrbach, S. Amin, M. Andriluka, and B. Schiele, A database for fine grained activity detection of cooking activities, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6247801

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by back-propagating errors, Nature, vol.85, issue.6088, 1986.
DOI : 10.1038/323533a0

B. Sapp and B. Taskar, MODEC: Multimodal Decomposable Models for Human Pose Estimation, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.471

B. Sapp, D. Weiss, and B. Taskar, Parsing human motion with stretchable models, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995607

C. Schuldt, I. Laptev, and B. Caputo, Recognizing human actions: a local SVM approach, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., 2004.
DOI : 10.1109/ICPR.2004.1334462

K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, NIPS, 2014.

K. Soomro, A. R. Zamir, and M. Shah, UCF101: A dataset of 101 human actions classes from videos in the wild, CRCV-TR-12-01, 2012.

M. Ranzato and L. Wolf, Deepface: Closing the gap to human-level performance in face verification, CVPR, 2014.

J. J. Tompson, A. Jain, Y. Lecun, and C. Bregler, Joint training of a convolutional network and a graphical model for human pose estimation, NIPS, 2014.

A. Toshev and C. Szegedy, DeepPose: Human Pose Estimation via Deep Neural Networks, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.214

URL : http://arxiv.org/abs/1312.4659

H. Wang, A. Kläser, C. Schmid, and C. Liu, Action recognition by dense trajectories, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995407

URL : https://hal.archives-ouvertes.fr/inria-00583818

H. Wang, A. Kläser, C. Schmid, and C. Liu, Dense Trajectories and Motion Boundary Descriptors for Action Recognition, International Journal of Computer Vision, vol.73, issue.2, pp.60-79, 2013.
DOI : 10.1007/s11263-012-0594-8

URL : https://hal.archives-ouvertes.fr/hal-00725627

H. Wang and C. Schmid, Action Recognition with Improved Trajectories, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.441

URL : https://hal.archives-ouvertes.fr/hal-00873267

Y. Yang and D. Ramanan, Articulated pose estimation with flexible mixtures-of-parts, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995741

N. J. Yue-hei, H. Matthew, V. Sudheendra, V. Oriol, M. Rajat et al., Beyond short snippets: Deep networks for video classification, CVPR, 2015.

Y. Zhou, B. Ni, R. Hong, M. Wang, and Q. Tian, Interaction part mining: A mid-level approach for fine-grained action recognition, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7298953

Y. Zhou, B. Ni, S. Yan, P. Moulin, and Q. Tian, Pipelining Localized Semantic Features for Fine-Grained Action Recognition, ECCV, 2014.
DOI : 10.1007/978-3-319-10593-2_32