B. Alexe, T. Deselaers, and V. Ferrari, What is an object?, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540226

B. Alexe, T. Deselaers, and V. Ferrari, Measuring the Objectness of Image Windows, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.11, 2012.
DOI : 10.1109/TPAMI.2012.28

W. Brendel and S. Todorovic, Video object segmentation by tracking regions, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459242

L. Cao, Z. Liu, and T. S. Huang, Cross-dataset action detection, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5539875

N. Dalal, B. Triggs, and C. Schmid, Human Detection Using Oriented Histograms of Flow and Appearance, ECCV, 2006.
DOI : 10.1023/A:1008162616689

URL : https://hal.archives-ouvertes.fr/inria-00548587

T. Deselaers, B. Alexe, and V. Ferrari, Weakly Supervised Localization and Learning with Generic Knowledge, International Journal of Computer Vision, vol.73, issue.2, pp.275-293, 2012.
DOI : 10.1007/s11263-012-0538-3

I. Endres and D. Hoiem, Category Independent Object Proposals, ECCV, 2010.
DOI : 10.1007/978-3-642-15555-0_42

M. Everingham, L. Van-gool, C. K. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, pp.303-338, 2010.
DOI : 10.1007/s11263-009-0275-4

I. Everts, J. Van-gemert, and T. Gevers, Evaluation of color stips for human action recognition Object detection with discriminatively trained part-based models, CVPR, pp.1627-1645, 2010.

J. Feng, Y. Wei, L. Tao, C. Zhang, and J. Sun, Salient object detection by composition, ICCV, 2011.

P. Huber, Robust statistics, 1981.
DOI : 10.1002/0471725250

M. Jain, H. Jégou, and P. Bouthemy, Better Exploiting Motion for Better Action Recognition, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.330

URL : https://hal.archives-ouvertes.fr/hal-00813014

A. Kläser, M. Marsza?ek, C. Schmid, and A. Zisserman, Human Focused Action Localization in Video, Trends and Topics in Computer Vision, pp.219-233, 2012.
DOI : 10.1007/978-3-642-35749-7_17

C. H. Lampert, M. B. Blaschko, and T. Hofmann, Beyond sliding windows: Object localization by efficient subwindow search, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587586

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.149.4517

T. Lan, Y. Wang, and G. Mori, Discriminative figure-centric models for joint action localization and recognition, ICCV, 2011.

S. Manen, M. Guillaumin, and L. Van-gool, Prime Object Proposals with Randomized Prim's Algorithm, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.315

URL : https://lirias.kuleuven.be/bitstream/123456789/450464/1/3716_final_OA.pdf

J. Odobez and P. Bouthemy, Robust Multiresolution Estimation of Parametric Motion Models, Journal of Visual Communication and Image Representation, vol.6, issue.4, pp.348-365, 1995.
DOI : 10.1006/jvci.1995.1029

F. Perronnin and C. R. Dance, Fisher Kernels on Visual Vocabularies for Image Categorization, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383266

E. Rahtu, J. Kannala, and M. Blaschko, Learning a category independent object detection cascade, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126351

URL : https://hal.archives-ouvertes.fr/hal-00855735

M. Raptis, I. Kokkinos, and S. Soatto, Discovering discriminative action parts from mid-level video representations, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6247807

URL : https://hal.archives-ouvertes.fr/hal-00918807

M. D. Rodriguez, J. Ahmed, and M. Shah, Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587727

S. Sadanand and J. J. Corso, Action bank: A high-level representation of activity in video, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6247806

Y. Tian, R. Sukthankar, and M. Shah, Spatiotemporal Deformable Part Models for Action Detection, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.341

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.295.4040

D. Tran and J. Yuan, Optimal spatio-temporal path discovery for video event detection, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995416

D. Tran and J. Yuan, Max-margin structured output regression for spatio-temporal action localization, NIPS, 2012.

D. Tran, J. Yuan, and D. Forsyth, Video event detection: From subvolume localization to spatio-temporal path search, 2013.

R. Trichet and R. Nevatia, Video segmentation with spatio-temporal tubes, 2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance, 2013.
DOI : 10.1109/AVSS.2013.6636661

J. R. Uijlings, K. E. Van-de-sande, T. Gevers, and A. W. Smeulders, Selective Search for Object Recognition, International Journal of Computer Vision, vol.57, issue.1, pp.154-171, 2013.
DOI : 10.1007/s11263-013-0620-5

J. C. Van-gemert, C. J. Veenman, and J. Geusebroek, Episode-Constrained Cross-Validation in Video Concept Retrieval, IEEE Transactions on Multimedia, vol.11, issue.4, pp.780-786, 2009.
DOI : 10.1109/TMM.2009.2017619

P. A. Viola and M. J. Jones, Robust Real-Time Face Detection, International Journal of Computer Vision, vol.57, issue.2, pp.137-154, 2004.
DOI : 10.1023/B:VISI.0000013087.49260.fb

H. Wang, A. Kläser, C. Schmid, and C. Liu, Action recognition by dense trajectories, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995407

URL : https://hal.archives-ouvertes.fr/inria-00583818

T. Wang, S. Wang, and D. Xiaoqing, Detecting Human Action as the Spatio-Temporal Tube of Maximum Mutual Information, IEEE Transactions on Circuits and Systems for Video Technology, vol.24, issue.2, pp.277-290, 2014.
DOI : 10.1109/TCSVT.2013.2276856

C. Xu and J. Corso, Evaluation of super-voxel methods for early video processing, CVPR, 2012.

C. Xu, C. Xiong, and J. Corso, Streaming Hierarchical Video Segmentation, ECCV, 2012.
DOI : 10.1007/978-3-642-33783-3_45

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.298.7791

J. Yuan, Z. Liu, and Y. Wu, Discriminative Video Pattern Search for Efficient Action Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.33, issue.9, pp.1728-1743, 2011.
DOI : 10.1109/TPAMI.2011.38