H. Wang, D. Oneata, J. Verbeek, and C. Schmid, A Robust and Efficient Video Representation for Action Recognition, International Journal of Computer Vision, vol.103, issue.1, 2015.
DOI : 10.1007/s11263-015-0846-5

URL : https://hal.archives-ouvertes.fr/hal-01145834

K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, In: NIPS, 2014.

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, Learning Spatiotemporal Features with 3D Convolutional Networks, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.510

Y. Ng, J. Hausknecht, M. Vijayanarasimhan, S. Vinyals, O. Monga et al., Beyond short snippets: Deep networks for video classification, In: CVPR, 2015.

G. Gkioxari and J. Malik, Finding action tubes, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7298676

P. Weinzaepfel, Z. Harchaoui, and C. Schmid, Learning to Track for Spatio-Temporal Action Localization, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.362

URL : https://hal.archives-ouvertes.fr/hal-01159941

L. Wang, Y. Qiao, and X. Tang, Video action detection with relational dynamicposelets, In: ECCV, 2014.

M. Puscas, M. Sangineto, E. Culibrk, D. Sebe, and N. , Unsupervised Tube Extraction Using Transductive Learning and Dense Trajectories, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.193

J. C. Van-gemert, M. Jain, E. Gati, and C. G. Snoek, APT: Action localization proposals from dense trajectories, Procedings of the British Machine Vision Conference 2015, 2015.
DOI : 10.5244/C.29.177

A. Kläser, M. Marszalek, C. Schmid, and A. Zisserman, Human Focused Action Localization in Video, International Workshop on Sign, Gesture, and Activity (SGA), 2010.
DOI : 10.1007/978-3-642-35749-7_17

I. Laptev and P. Pérez, Retrieving actions in movies, 2007 IEEE 11th International Conference on Computer Vision, 2007.
DOI : 10.1109/ICCV.2007.4409105

G. Yu and J. Yuan, Fast action proposals for human action detection and search, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7298735

S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015.
DOI : 10.1109/TPAMI.2016.2577031

M. Andriluka, L. Pishchulin, P. Gehler, and B. Schiele, 2D Human Pose Estimation: New Benchmark and State of the Art Analysis, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.471

S. Hare, A. Saffari, and P. Torr, Struck: Structured output tracking with kernels, In: ICCV, 2011.

Z. Kalal, K. Mikolajczyk, and J. Matas, Tracking-Learning-Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.7, 2012.
DOI : 10.1109/TPAMI.2011.239

R. G. Cinbis, J. Verbeek, and C. Schmid, Weakly Supervised Object Localization with Multi-Fold Multiple Instance Learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, issue.1, 2016.
DOI : 10.1109/TPAMI.2016.2535231

URL : https://hal.archives-ouvertes.fr/hal-01123482

L. Cao, Z. Liu, and T. S. Huang, Cross-dataset action detection, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5539875

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.459.6620

J. Yuan, Z. Liu, and Y. Wu, Discriminative subvolume search for efficient action detection, In: CVPR, 2009.

A. Gaidon, Z. Harchaoui, and C. Schmid, Temporal Localization of Actions with Actoms, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.11, 2013.
DOI : 10.1109/TPAMI.2013.65

URL : https://hal.archives-ouvertes.fr/hal-00687312

T. Lan, Y. Wang, and G. Mori, Discriminative figure-centric models for joint action localization and recognition, In: ICCV, 2011.

A. Kläser, M. Marszaek, and C. Schmid, A Spatio-Temporal Descriptor Based on 3D-Gradients, Procedings of the British Machine Vision Conference 2008, 2008.
DOI : 10.5244/C.22.99

M. Jain, J. C. Van-gemert, H. Jégou, P. Bouthemy, and C. G. Snoek, Action Localization with Tubelets from Motion, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.100

URL : https://hal.archives-ouvertes.fr/hal-00996844

W. Chen, C. Xiong, R. Xu, and J. Corso, Actionness Ranking with Lattice Conditional Ordinal Random Fields, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.101

D. Oneata, J. Revaud, J. Verbeek, and C. Schmid, Spatio-temporal Object Detection Proposals, In: ECCV, 2014.
DOI : 10.1007/978-3-319-10578-9_48

URL : https://hal.archives-ouvertes.fr/hal-01021902

P. Bojanowski, R. Lajugie, F. Bach, I. Laptev, J. Ponce et al., Weakly Supervised Action Labeling in Videos under Ordering Constraints, In: ECCV, 2014.
DOI : 10.1007/978-3-319-10602-1_41

URL : https://hal.archives-ouvertes.fr/hal-01053967

O. Duchenne, I. Laptev, J. Sivic, F. Bach, and J. Ponce, Automatic annotation of human actions in video, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459279

M. Hoai, L. Torresani, F. De-la-torre, and C. Rother, Learning discriminative localization from weakly labeled data, Pattern Recognition, vol.47, issue.3, 2014.
DOI : 10.1016/j.patcog.2013.09.028

P. Siva and T. Xiang, Weakly Supervised Action Detection, Procedings of the British Machine Vision Conference 2011, 2011.
DOI : 10.5244/C.25.65

I. Laptev, On space-time interest points, IJCV, 2005.

E. A. Mosabbeb, R. Cabral, F. De-la-torre, and M. Fathy, Multi-label Discriminative Weakly-Supervised Human Activity Recognition and Localization, 2014.
DOI : 10.1007/978-3-319-16814-2_16

S. Ma, J. Zhang, N. Ikizler-cinbis, and S. Sclaroff, Action Recognition and Localization by Hierarchical Space-Time Segments, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.341

W. Chen and J. J. Corso, Action Detection by Implicit Intentional Motion Clustering, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.377

N. Shapovalova, A. Vahdat, K. Cannons, T. Lan, and G. Mori, Similarity Constrained Latent Support Vector Machine: An Application to Weakly Supervised Action Classification, In: ECCV, 2012.
DOI : 10.1007/978-3-642-33786-4_5

H. Boyraz, S. Z. Masood, B. Liu, M. Tappen, and H. Foroosh, Action Recognition by Weakly-Supervised Discriminative Region Localization, Proceedings of the British Machine Vision Conference 2014, 2014.
DOI : 10.5244/C.28.111

T. Lan, Y. Zhu, R. Zamir, A. Savarese, and S. , Action Recognition by Hierarchical Mid-Level Action Elements, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.517

M. D. Rodriguez, J. Ahmed, and M. Shah, Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587727

H. Jhuang, J. Gall, S. Zuffi, C. Schmid, and M. J. Black, Towards Understanding Action Recognition, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.396

URL : https://hal.archives-ouvertes.fr/hal-00906902

K. Soomro, A. R. Zamir, and M. Shah, UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild, pp.12-13, 2012.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, In: ICLR, 2015.

J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek, Image Classification with the Fisher Vector: Theory and Practice, International Journal of Computer Vision, vol.73, issue.2, 2013.
DOI : 10.1007/s11263-013-0636-x