R. Aly, C. Hauff, W. Heeren, D. Hiemstra, F. De-jong et al., The lowlands team at TRECVid, Proceedings of the TRECVid Workshop, 2007.

R. Aly, D. Hiemstra, A. P. De-vries, and H. Rode, The lowlands team at TRECVid, Proceedings of the TRECVid Workshop, 2008.

R. Arandjelovi´carandjelovi´c and A. Zisserman, Multiple queries for large scale specific object retrieval, Proceedings of the British Machine Vision Conference, 2012.

R. Arandjelovi´carandjelovi´c and A. Zisserman, Three things everyone should know to improve object retrieval, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2012.

M. Ayari, J. Delhumeau, M. Douze, H. Jégou, D. Potapov et al., INRIA@TRECVID'2011: Copy Detection & Multimedia Event Detection, Proceedings of the TRECVid Workshop, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00648016

K. Chatfield and A. Zisserman, VISOR: Towards On-the-Fly Large-Scale Object Category Retrieval, 2012.
DOI : 10.1007/978-3-642-37444-9_34

S. Clinchant, G. Csurka, F. Perronnin, and J. Renders, XRCE's participation to ImagEval, ImageEval workshop at CVIR, 2007.

N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005.
DOI : 10.1109/CVPR.2005.177

URL : https://hal.archives-ouvertes.fr/inria-00548512

N. Dalal, B. Triggs, and C. Schmid, Human Detection Using Oriented Histograms of Flow and Appearance, Proceedings of the European Conference on Computer Vision, 2006.
DOI : 10.1023/A:1008162616689

URL : https://hal.archives-ouvertes.fr/inria-00548587

M. Everingham, J. Sivic, and A. Zisserman, Taking the bite out of automatic naming of characters in TV video, 2009.

P. Felzenszwalb and D. Huttenlocher, Pictorial Structures for Object Recognition, International Journal of Computer Vision, vol.61, issue.1, 2005.
DOI : 10.1023/B:VISI.0000042934.15159.49

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.6365

J. Gauvain, L. Lamel, and G. Adda, The LIMSI Broadcast News transcription system, Speech Communication, vol.37, issue.1-2, pp.89-108, 2002.
DOI : 10.1016/S0167-6393(01)00061-9

URL : https://hal.archives-ouvertes.fr/hal-01434493

G. B. Huang, M. Ramesh, T. Berg, and E. Learned-miller, Labeled faces in the wild: A database for studying face recognition in unconstrained environments, 2007.

H. Jégou, M. Douze, G. Gravier, C. Schmid, and P. Gros, INRIA LEAR- TEXMEX: Video copy detection task, Proceedings of the TRECVid Workshop, 2010.

H. Jégou, M. Douze, C. Schmid, and P. Pérez, Aggregating local descriptors into a compact image representation, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.3304-3311, 2010.
DOI : 10.1109/CVPR.2010.5540039

J. Krapac, J. Verbeek, and F. Jurie, Modeling spatial layout with fisher vectors for image categorization, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126406

URL : https://hal.archives-ouvertes.fr/inria-00612277

S. Lazebnik, C. Schmid, and J. Ponce, Spatial Pyramid Matching, Object Categorization: Computer and Human Vision Perspectives, pp.401-415, 2009.
DOI : 10.1017/CBO9780511635465.022

URL : https://hal.archives-ouvertes.fr/inria-00548647

D. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, vol.60, issue.2, pp.91-110, 2004.
DOI : 10.1023/B:VISI.0000029664.99615.94

J. Matas, O. Chum, M. Urban, T. Pajdla22, ]. K. Mcguinness et al., Robust wide baseline stereo from maximally stable extremal regions, Proceedings of the British Machine Vision Conference Zisserman , A. Smeaton, and H. Beunders. AXES at TRECVid Proceedings of the TRECVid Workshop, pp.384-393, 2002.

P. Natarajan, S. Wu, S. Vitaladevuni, X. Zhuang, S. Tsakalidis et al., Multimodal feature fusion for robust event detection in web videos, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6247814

D. Oneata, J. Verbeek, and C. Schmid, Action and Event Recognition with Fisher Vectors on a Compact Feature Set, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.228

URL : https://hal.archives-ouvertes.fr/hal-00873662

M. Osian and L. V. , Video shot characterization, Proceedings of the TRECVid Workshop, 2003.
DOI : 10.1007/s00138-004-0141-x

P. Over, G. Awad, M. Michel, J. Fiscus, G. Sanders et al., Trecvid 2013 ? an overview of the goals, tasks, data, evaluation mechanisms and metrics, Proceedings of TRECVID 2013, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00953093

O. M. Parkhi, A. Vedaldi, and A. Zisserman, On-the-fly specific person retrieval, 2012 13th International Workshop on Image Analysis for Multimedia Interactive Services, 2012.
DOI : 10.1109/WIAMIS.2012.6226775

M. Perd-'och, O. Chum, and J. Matas, Efficient representation of local geometry for large scale object retrieval, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009.

F. Perronnin, J. Sánchez, and T. Mensink, Improving the Fisher Kernel for Large-Scale Image Classification, Proceedings of the European Conference on Computer Vision, 2010.
DOI : 10.1007/978-3-642-15561-1_11

URL : https://hal.archives-ouvertes.fr/inria-00548630

J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, Object retrieval with large vocabularies and fast spatial matching, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383172

J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek, Image Classification with the Fisher Vector: Theory and Practice, International Journal of Computer Vision, vol.73, issue.2, pp.222-245, 2013.
DOI : 10.1007/s11263-013-0636-x

J. A. Shaw, E. A. Fox, J. A. Shaw, and E. A. Fox, Combination of multiple searches, The Third Text REtrieval Conference (TREC-3), pp.243-252, 1994.

E. Shechtman and M. Irani, Matching Local Self-Similarities across Images and Videos, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383198

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.76.1297

J. Sivic and A. Zisserman, Video Google: a text retrieval approach to object matching in videos, Proceedings Ninth IEEE International Conference on Computer Vision, 2003.
DOI : 10.1109/ICCV.2003.1238663

S. Strassel, A. Morris, J. Fiscus, C. Caruso, H. Lee et al., Creating havic: Heterogeneous audio visual internet collection, Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12) European Language Resources Association (ELRA), 2012.

S. Vempati, M. Jain, O. M. Parkhi, C. V. Jawahar, M. Marszalek et al., Oxford-IIIT TRECVID 2009 -Notebook Paper, Proceedings of the TRECVid Workshop

H. Wang, A. Kläser, C. Schmid, and C. Liu, Dense Trajectories and Motion Boundary Descriptors for Action Recognition, International Journal of Computer Vision, vol.73, issue.2, 2013.
DOI : 10.1007/s11263-012-0594-8

URL : https://hal.archives-ouvertes.fr/hal-00725627

H. Wang and C. Schmid, Action Recognition with Improved Trajectories, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.441

URL : https://hal.archives-ouvertes.fr/hal-00873267