T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, High Accuracy Optical Flow Estimation Based on a Theory for Warping, ECCV, 2004.
DOI : 10.1007/978-3-540-24673-2_3

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.1732

L. Cao, Z. Liu, and T. S. Huang, Cross-dataset action detection, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5539875

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.459.6620

W. Chen and J. J. Corso, Action Detection by Implicit Intentional Motion Clustering, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.377

W. Chen, C. Xiong, R. Xu, and J. Corso, Actionness Ranking with Lattice Conditional Ordinal Random Fields, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.101

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.671.8057

M. Everingham, L. Van-gool, C. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, 2007.
DOI : 10.1371/journal.pcbi.0040027

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.167.6629

C. Feichtenhofer, A. Pinz, and A. Zisserman, Convolutional Two-Stream Network Fusion for Video Action Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.213

URL : http://arxiv.org/abs/1604.06573

J. Gemert, M. Jain, E. Gati, and C. G. Snoek, APT: Action localization proposals from dense trajectories, Procedings of the British Machine Vision Conference 2015, 2015.
DOI : 10.5244/C.29.177

URL : https://pure.uva.nl/ws/files/2568095/170073_paper177.pdf

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.81

URL : http://arxiv.org/abs/1311.2524

G. Gkioxari and J. Malik, Finding action tubes, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005.
DOI : 10.1109/CVPR.2015.7298676

W. Hu, T. Tan, L. Wang, and S. Maybank, A Survey on Visual Surveillance of Object Motion and Behaviors, IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), vol.34, issue.3, 2004.
DOI : 10.1109/TSMCC.2004.829274

M. Jain, J. Van-gemert, H. Jégou, P. Bouthemy, and C. G. Snoek, Action Localization with Tubelets from Motion, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.100

URL : https://hal.archives-ouvertes.fr/hal-00996844

H. Jhuang, J. Gall, S. Zuffi, C. Schmid, and M. J. Black, Towards Understanding Action Recognition, 2013 IEEE International Conference on Computer Vision
DOI : 10.1109/ICCV.2013.396

URL : https://hal.archives-ouvertes.fr/hal-00906902

T. Kroeger, R. Timofte, D. Dai, and L. Van-gool, Fast Optical Flow Using Dense Inverse Search, ECCV, 2016.
DOI : 10.1109/CVPR.2015.7298704

URL : http://arxiv.org/abs/1603.03590

T. Lan, Y. Wang, and G. Mori, Discriminative figure-centric models for joint action localization and recognition, ICCV, 2011.

I. Laptev and P. Pérez, Retrieving actions in movies, 2007 IEEE 11th International Conference on Computer Vision, 2007.
DOI : 10.1109/ICCV.2007.4409105

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.80.1618

Z. Li, E. Gavves, M. Jain, and C. G. Snoek, VideoLSTM convolves, attends and flows for action recognition

T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona et al., Microsoft COCO: Common Objects in Context, ECCV, 2014.
DOI : 10.1007/978-3-319-10602-1_48

URL : http://arxiv.org/abs/1405.0312

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed et al., SSD: Single Shot MultiBox Detector, ECCV, 2006.
DOI : 10.1109/CVPR.2008.4587597

URL : http://arxiv.org/pdf/1512.02325

M. M. Puscas, E. Sangineto, D. Culibrk, and N. Sebe, Unsupervised Tube Extraction Using Transductive Learning and Dense Trajectories, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.193

S. Oh, A. Hoogs, A. Perera, N. Cuntoor, C. Chen et al., A large-scale benchmark dataset for event recognition in surveillance video, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995586

D. Oneata, J. Revaud, J. Verbeek, and C. Schmid, Spatiotemporal object detection proposals, ECCV, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01021902

X. Peng and C. Schmid, Multi-region Two-Stream R-CNN for Action Detection, ECCV, 2006.
DOI : 10.1109/CVPR.2015.7298735

URL : https://hal.archives-ouvertes.fr/hal-01349107

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, You Only Look Once: Unified, Real-Time Object Detection, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.91

URL : http://arxiv.org/abs/1506.02640

S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, NIPS, 2015.
DOI : 10.1109/TPAMI.2016.2577031

URL : http://arxiv.org/abs/1506.01497

M. D. Rodriguez, J. Ahmed, and M. Shah, Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587727

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.152.8729

S. Saha, G. Singh, M. Sapienza, P. H. Torr, and F. Cuzzolin, Deep Learning for Detecting Multiple Space-Time Action Tubes in Videos, Procedings of the British Machine Vision Conference 2016, 2006.
DOI : 10.5244/C.30.58

K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, NIPS, 2014, p.8

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, ICLR, 2015.

G. Singh, S. Saha, M. Sapienza, P. Torr, and F. Cuzzolin, Online real time multiple spatiotemporal action localisation and prediction on a single platform, arXiv preprint, 2009.

K. Soomro, A. R. Zamir, and M. Shah, UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild, CRCV-TR, 2005.

J. R. Uijlings, K. E. Van-de-sande, T. Gevers, and A. W. Smeulders, Selective Search for Object Recognition, International Journal of Computer Vision, vol.57, issue.1, 2013.
DOI : 10.1023/B:VISI.0000013087.49260.fb

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.361.3382

S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell et al., Sequence to Sequence -- Video to Text, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.515

URL : http://arxiv.org/abs/1505.00487

L. Wang, Y. Qiao, X. Tang, and L. Van-gool, Actionness Estimation Using Hybrid Fully Convolutional Networks, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
DOI : 10.1109/CVPR.2016.296

URL : http://arxiv.org/abs/1604.07279

L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin et al., Temporal Segment Networks: Towards Good Practices for Deep Action Recognition, ECCV, 2016.
DOI : 10.1109/CVPR.2016.219

URL : http://arxiv.org/abs/1608.00859

P. Weinzaepfel, Z. Harchaoui, and C. Schmid, Learning to Track for Spatio-Temporal Action Localization, 2015 IEEE International Conference on Computer Vision (ICCV), 2005.
DOI : 10.1109/ICCV.2015.362

URL : https://hal.archives-ouvertes.fr/hal-01159941

L. Yao, A. Torabi, K. Cho, N. Ballas, C. Pal et al., Describing Videos by Exploiting Temporal Structure, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.512

URL : http://arxiv.org/abs/1502.08029

G. Yu and J. Yuan, Fast action proposals for human action detection and search, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7298735