K. Soomro, A. R. Zamir, and M. Shah, UCF101: A Dataset of 101 Human Action Classes from Videos in the Wild, p.18, 2012.

H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, HMDB: A large video database for human motion recognition, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126543
URL : http://cbcl.mit.edu/publications/ps/Kuehne_etal_iccv11.pdf

H. Wang and C. Schmid, Action Recognition with Improved Trajectories, 2013 IEEE International Conference on Computer Vision, p.36, 2013.
DOI : 10.1109/ICCV.2013.441
URL : https://hal.archives-ouvertes.fr/hal-00873267

D. Oneata, J. Verbeek, and C. Schmid, Action and Event Recognition with Fisher Vectors on a Compact Feature Set, 2013 IEEE International Conference on Computer Vision, p.36
DOI : 10.1109/ICCV.2013.228
URL : https://hal.archives-ouvertes.fr/hal-00873662

K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, p.26, 2014.

J. Liu, J. Luo, and M. Shah, Recognizing realistic actions from videos " in the wild, IEEE CVPR, 2009.

H. Pirsiavash and D. Ramanan, Detecting activities of daily living in first-person camera views, 2012 IEEE Conference on Computer Vision and Pattern Recognition
DOI : 10.1109/CVPR.2012.6248010

M. S. Ryoo and L. Matthies, First-Person Activity Recognition: What Are They Doing to Me?, 2013 IEEE Conference on Computer Vision and Pattern Recognition
DOI : 10.1109/CVPR.2013.352
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.651.1984

S. Satkin and M. Hebert, Modeling the Temporal Extent of Actions, p.18, 2010.
DOI : 10.1007/978-3-642-15549-9_39

P. Bojanowski, R. Lajugie, F. Bach, I. Laptev, J. Ponce et al., Weakly Supervised Action Labeling in Videos under Ordering Constraints, p.33, 2014.
DOI : 10.1007/978-3-319-10602-1_41
URL : https://hal.archives-ouvertes.fr/hal-01053967

O. Duchenne, I. Laptev, J. Sivic, F. Bach, and J. Ponce, Automatic annotation of human actions in video, 2009 IEEE 12th International Conference on Computer Vision, p.33, 2009.
DOI : 10.1109/ICCV.2009.5459279

M. Hoai, Z. Lan, and F. De-la-torre, Joint segmentation and classification of human actions in video, CVPR 2011, p.33, 2011.
DOI : 10.1109/CVPR.2011.5995470

H. Pirsiavash and D. Ramanan, Parsing Videos of Actions with Segmental Grammars, 2014 IEEE Conference on Computer Vision and Pattern Recognition, p.33, 2014.
DOI : 10.1109/CVPR.2014.85

Z. Shou, D. Wang, and S. Chang, Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
DOI : 10.1109/CVPR.2016.119
URL : http://arxiv.org/abs/1601.02129

A. Richard and J. Gall, Temporal Action Detection Using a Statistical Language Model, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
DOI : 10.1109/CVPR.2016.341

Y. Ke, R. Sukthankar, and M. Hebert, Event Detection in Crowded Videos, 2007 IEEE 11th International Conference on Computer Vision, p.33, 2007.
DOI : 10.1109/ICCV.2007.4409011
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.64.4575

A. Klaser, M. Marsza?ek, C. Schmid, and A. Zisserman, Human Focused Action Localization in Video, International Workshop on Sign, Gesture, and Activity (SGA), ECCV Workshops, 2010.
DOI : 10.1007/978-3-642-35749-7_17
URL : https://hal.archives-ouvertes.fr/inria-00514845

I. Laptev and P. Pérez, Retrieving actions in movies, 2007 IEEE 11th International Conference on Computer Vision, 2007.
DOI : 10.1109/ICCV.2007.4409105
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.80.1618

Y. Tian, R. Sukthankar, and M. Shah, Spatiotemporal Deformable Part Models for Action Detection, 2013 IEEE Conference on Computer Vision and Pattern Recognition, p.33, 2013.
DOI : 10.1109/CVPR.2013.341
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.295.4040

K. Soomro, H. Idrees, and M. Shah, Action Localization in Videos through Context Walk, 2015 IEEE International Conference on Computer Vision (ICCV)
DOI : 10.1109/ICCV.2015.375

K. Soomro, H. Idrees, and M. Shah, Predicting the Where and What of Actors and Actions through Online Action Localization, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.290

A. Gorban, H. Idrees, Y. Jiang, A. Zamir, I. Laptev et al., Action Recognition with a Large Number of Classes, pp.2015-2025

C. Schuldt, I. Laptev, and B. Caputo, Recognizing human actions: a local SVM approach, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., 2004.
DOI : 10.1109/ICPR.2004.1334462

M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, Actions as space-time shapes, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, 2005.
DOI : 10.1109/ICCV.2005.28
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.100.8218

M. Rodriguez, J. Ahmed, and M. Shah, Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587727
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.152.8729

Y. Ke, R. Sukthankar, and M. Hebert, Efficient visual event detection using volumetric features, IEEE ICCV, vol.5, p.33, 2005.

J. Yuan, Z. Liu, and Y. Wu, Discriminative Subvolume Search for Efficient Action Detection, IEEE CVPR, vol.5, p.33, 2009.

I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, Learning realistic human actions from movies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, p.10, 2008.
DOI : 10.1109/CVPR.2008.4587756
URL : https://hal.archives-ouvertes.fr/inria-00548659

K. K. Reddy and M. Shah, Recognizing 50 human action categories of web videos, Machine Vision and Applications, vol.24, issue.5, pp.971-981, 2013.
DOI : 10.1007/s00138-012-0450-4
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.399.2019

S. Ayache and G. Quénot, Video Corpus Annotation Using Active Learning, European Conference on Information Retrieval, pp.187-198, 2008.
DOI : 10.1007/978-3-540-78646-7_19
URL : https://hal.archives-ouvertes.fr/hal-01089795

E. Yilmaz and J. A. Aslam, Estimating average precision with incomplete and imperfect judgments, Proceedings of the 15th ACM international conference on Information and knowledge management , CIKM '06, 2006.
DOI : 10.1145/1183614.1183633
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.331.5032

S. Strassel, A. Morris, J. Fiscus, C. Caruso, H. Lee et al., Creating HAVIC: Heterogeneous Audio Visual Internet Collection, Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), European Language Resources Association (ELRA), 2012.

F. Caba-heilbron, V. Escorcia, B. Ghanem, and J. C. Niebles, ActivityNet: A large-scale video benchmark for human activity understanding, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.10, 2015.
DOI : 10.1109/CVPR.2015.7298698

D. Weinland, E. Boyer, and R. Ronfard, Action Recognition from Arbitrary Views using 3D Exemplars, 2007 IEEE 11th International Conference on Computer Vision, pp.1-7, 2007.
DOI : 10.1109/ICCV.2007.4408849
URL : https://hal.archives-ouvertes.fr/inria-00544741

J. C. Niebles, C. Chen, and F. Li, Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification, p.33, 2010.
DOI : 10.1007/978-3-642-15552-9_29

K. Ning and F. Wu, ZJUDCD Submission at THUMOS Challenge, THU- MOS'15 Action Recognition Challenge, p.25, 2015.

X. Peng and C. Schmid, Encoding Feature Maps of CNNs for Action Recognition, in: THUMOS'15 Action Recognition Challenge, p.25, 2015.

L. Wang, Z. Wang, Y. Xiong, and Y. Qiao, CUHK&SIAT submission for THU- MOS'15 action recognition challenge, in: THUMOS'15 Action Recognition Challenge, p.25, 2015.

Y. Liu, B. Fan, S. Zhao, Y. Xu, and Y. Han, Tianjin University Submission at THU- MOS Challenge, THUMOS'15 Action Recognition Challenge, p.25, 2015.

K. Ohnishi and T. Harada, MIL-UTokyo at THUMOS Challenge, THU- MOS'15 Action Recognition Challenge, p.25, 2015.

J. Yuan, Y. Pei, B. Ni, P. Moulin, and A. Kassim, ADSC Submission at THUMOS Challenge, THUMOS'15 Action Recognition Challenge, p.33, 2015.

J. Cai and Q. Tian, UTSA submission to THUMOS 2015, in: THUMOS'15 Action Recognition Challenge, p.25, 2015.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, in: arXiv preprint, p.22, 2014.

M. D. Zeiler and R. Fergus, Visualizing and Understanding Convolutional Networks, p.22, 2014.
DOI : 10.1007/978-3-319-10590-1_53
URL : http://arxiv.org/abs/1311.2901

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, Learning Spatiotemporal Features with 3D Convolutional Networks, 2015 IEEE International Conference on Computer Vision (ICCV), p.22, 2015.
DOI : 10.1109/ICCV.2015.510
URL : http://arxiv.org/abs/1412.0767

Z. Xu, Y. Yang, and A. G. Hauptmann, A discriminative CNN video representation for event detection, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.25, 2015.
DOI : 10.1109/CVPR.2015.7298789

H. Jégou, M. Douze, C. Schmid, and P. Pérez, Aggregating local descriptors into a compact image representation, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p.23, 2010.
DOI : 10.1109/CVPR.2010.5540039

J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek, Image Classification with the Fisher Vector: Theory and Practice, International Journal of Computer Vision, vol.73, issue.2, pp.222-245, 2013.
DOI : 10.1007/s11263-013-0636-x

Z. Lan, M. Lin, X. Li, A. G. Hauptmann, and B. Raj, Beyond gaussian pyramid: Multi-skip feature stacking for action recognition, IEEE CVPR, p.23, 2015.

K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman, Return of the Devil in the Details: Delving Deep into Convolutional Nets, Proceedings of the British Machine Vision Conference 2014, p.24, 2014.
DOI : 10.5244/C.28.6

M. Raptis and L. Sigal, Poselet Key-Framing: A Model for Human Activity Recognition, 2013 IEEE Conference on Computer Vision and Pattern Recognition, p.33, 2013.
DOI : 10.1109/CVPR.2013.342

K. Tang, L. Fei-fei, and D. Koller, Learning latent temporal structure for complex event detection, 2012 IEEE Conference on Computer Vision and Pattern Recognition, p.33, 2012.
DOI : 10.1109/CVPR.2012.6247808
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.309.7443