B. Alexe, T. Deselares, and V. Ferrari, Measuring the Objectness of Image Windows, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.11, pp.2189-2202, 2012.
DOI : 10.1109/TPAMI.2012.28

S. An, P. Peursum, W. Liu, and S. Venkatesh, Efficient algorithms for subwindow search in object detection and localization, CVPR, pp.264-271, 2009.

K. Chatfield, V. Lempitsky, A. Vedaldi, and A. Zisserman, The devil is in the details: an evaluation of recent feature encoding methods, Procedings of the British Machine Vision Conference 2011, 2011.
DOI : 10.5244/C.25.76

Q. Chen, Z. Song, R. Feris, A. Datta, L. Cao et al., Efficient Maximum Appearance Search for Large-Scale Object Detection, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.410

R. Cinbis, J. Verbeek, and C. Schmid, Image categorization using Fisher kernels of non-iid image models, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6247926

URL : https://hal.archives-ouvertes.fr/hal-00685943

R. Cinbis, J. Verbeek, and C. Schmid, Segmentation Driven Object Detection with Fisher Vectors, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.369

URL : https://hal.archives-ouvertes.fr/hal-00873134

G. Csurka and F. Perronnin, An Efficient Approach to Semantic Segmentation, International Journal of Computer Vision, vol.60, issue.2, pp.198-212, 2011.
DOI : 10.1007/s11263-010-0344-8

M. V. Den-bergh, G. Roig, X. Boix, S. Manen, and L. V. , Online video seeds for temporal window objectness, ICCV, 2013.

O. Duchenne, I. Laptev, J. Sivic, F. Bach, and J. Ponce, Automatic annotation of human actions in video, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459279

A. Gaidon, Z. Harchaoui, and C. Schmid, Actom sequence models for efficient action detection, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995646

URL : https://hal.archives-ouvertes.fr/inria-00575217

H. Harzallah, F. Jurie, and C. Schmid, Combining efficient object localization and image classification, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459257

URL : https://hal.archives-ouvertes.fr/inria-00439516

T. Jaakkola, D. Haussler, M. Jain, H. Jégou, and P. Bouthemy, Exploiting generative models in discriminative classifiers Better exploiting motion for better action recognition, NIPS CVPR, 1999.

H. Jégou, F. Perronnin, M. Douze, J. Sánchez, P. Pérez et al., Aggregating Local Image Descriptors into Compact Codes, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.9, pp.1704-1716, 2012.
DOI : 10.1109/TPAMI.2011.235

H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, HMDB: A large video database for human motion recognition, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126543

C. Lampert, M. Blaschko, and T. Hofmann, Efficient Subwindow Search: A Branch and Bound Framework for Object Localization, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, issue.12, pp.312129-2142, 2009.
DOI : 10.1109/TPAMI.2009.144

I. Laptev and P. Pérez, Retrieving actions in movies, 2007 IEEE 11th International Conference on Computer Vision, 2007.
DOI : 10.1109/ICCV.2007.4409105

S. Lazebnik, C. Schmid, and J. Ponce, Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), 2006.
DOI : 10.1109/CVPR.2006.68

URL : https://hal.archives-ouvertes.fr/inria-00548585

Z. Li, E. Gavves, K. Van-de-sande, C. Snoek, and A. Smeulders, Codemaps - Segment, Classify and Search Objects Locally, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.454

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.431.4858

M. Marszalek, I. Laptev, and C. Schmid, Actions in context, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206557

URL : https://hal.archives-ouvertes.fr/inria-00548645

D. Oneata, J. Verbeek, and C. Schmid, Action and Event Recognition with Fisher Vectors on a Compact Feature Set, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.228

URL : https://hal.archives-ouvertes.fr/hal-00873662

F. Perronnin and C. Dance, Fisher Kernels on Visual Vocabularies for Image Categorization, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383266

F. Perronnin, J. Sánchez, and T. Mensink, Improving the Fisher Kernel for Large-Scale Image Classification, ECCV, 2010.
DOI : 10.1007/978-3-642-15561-1_11

URL : https://hal.archives-ouvertes.fr/inria-00548630

J. Sánchez and F. Perronnin, High-dimensional signature compression for large-scale image classification, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995504

J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek, Image Classification with the Fisher Vector: Theory and Practice, International Journal of Computer Vision, vol.73, issue.2, pp.222-245, 2013.
DOI : 10.1007/s11263-013-0636-x

C. Schuldt, I. Laptev, and B. Caputo, Recognizing human actions: a local SVM approach, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., 2004.
DOI : 10.1109/ICPR.2004.1334462

C. Sun and R. Nevatia, Large-scale web video event classification by use of Fisher Vectors, 2013 IEEE Workshop on Applications of Computer Vision (WACV), 2013.
DOI : 10.1109/WACV.2013.6474994

J. Uijlings, K. Van-de-sande, T. Gevers, and A. Smeulders, Selective Search for Object Recognition, International Journal of Computer Vision, vol.57, issue.1, pp.154-171, 2013.
DOI : 10.1007/s11263-013-0620-5

A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman, Multiple kernels for object detection, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459183

P. Viola and M. Jones, Robust Real-Time Face Detection, International Journal of Computer Vision, vol.57, issue.2, pp.137-154, 2004.
DOI : 10.1023/B:VISI.0000013087.49260.fb

H. Wang, A. Kläser, C. Schmid, and C. Liu, Dense Trajectories and Motion Boundary Descriptors for Action Recognition, International Journal of Computer Vision, vol.73, issue.2, pp.60-79, 2013.
DOI : 10.1007/s11263-012-0594-8

URL : https://hal.archives-ouvertes.fr/hal-00725627

H. Wang and C. Schmid, Action Recognition with Improved Trajectories, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.441

URL : https://hal.archives-ouvertes.fr/hal-00873267

X. Wang, L. Wang, and Y. Qiao, A Comparative Study of Encoding, Pooling and Normalization Methods for Action Recognition, ACCV, 2012.
DOI : 10.1007/978-3-642-37431-9_44

C. Xu, C. Xiong, and J. Corso, Streaming Hierarchical Video Segmentation, ECCV, 2012.
DOI : 10.1007/978-3-642-33783-3_45

J. Yuan, Z. Liu, and Y. Wu, Discriminative subvolume search for efficient action detection, CVPR, 2009.