C. Wojek, A. Geiger, and R. Urtasun, Joint 3d estimation of objects and scene layout, NIPS, 2011.

R. Adams, H. Wallach, and Z. Ghahramani, Learning the structure of deep sparse graphical models, AISTATS, 2010.

B. Alexe, T. Deselaers, and V. Ferrari, What is an object?, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540226

B. Alexe, T. Deselares, and V. Ferrari, Measuring the Objectness of Image Windows, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.11, pp.2189-2202, 2012.
DOI : 10.1109/TPAMI.2012.28

A. Arandjelovi´carandjelovi´c and . Zisserman, Three things everyone should know to improve object retrieval, CVPR, 2012.

R. Arandjelovi´carandjelovi´c, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, NetVLAD: CNN architecture for weakly supervised place recognition Arxiv preprint, 2015.

A. Arnab, S. Jayasumana, S. Zheng, and P. Torr, Higher order potentials in end-to-end trainable conditional random fields, 2015.

S. Bagon, O. Brostovski, M. Galun, and M. Irani, Detecting and sketching the common, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540233

B. Bai, J. Weston, D. Grangier, R. Collobert, K. Sadamasa et al., Learning to rank with (a lot of) word features, Information Retrieval, vol.22, issue.1, pp.291-314, 2010.
DOI : 10.1007/s10791-009-9117-9

K. Barnard, P. Duygulu, D. Forsyth, N. De-freitas, D. Blei et al., Matching words and pictures, JMLR, vol.3, pp.1107-1135, 2003.

R. Bekkerman and J. Jeon, Multi-modal Clustering for Multimedia Collections, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383223

A. Bellet, A. Habrard, and M. Sebban, A Survey on Metric Learning for Feature Vectors and Structured Data. ArXiv e-prints, 1306.

S. Bengio, J. Weston, and D. Grangier, Label embedding trees for large multi-class tasks, NIPS, 2011.

T. Berg and D. Forsyth, Animals on the Web, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), 2006.
DOI : 10.1109/CVPR.2006.57

T. Berg, A. Berg, J. Edwards, M. Maire, R. White et al., Names and faces in the news, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., 2004.
DOI : 10.1109/CVPR.2004.1315253

H. Bilen and A. Vedaldi, Weakly Supervised Deep Detection Networks, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.311

H. Bilen, M. Pedersoli, and T. Tuytelaars, Weakly supervised object detection with posterior regularization, BMVC, 2014.

C. Bishop, Pattern recognition and machine learning, 2006.

L. Bottou, Large-scale machine learning with stochastic gradient descent, COMPSTAT, 2010.

J. Bradley and C. Guestrin, Learning tree conditional random fields, ICML, 2010.

S. Branson, C. Wah, F. Schroff, B. Babenko, P. Welinder et al., Visual Recognition with Humans in the Loop, ECCV, 2010.
DOI : 10.1007/978-3-642-15561-1_32

T. Brox and J. Malik, Object Segmentation by Long Term Analysis of Point Trajectories, ECCV, 2010.
DOI : 10.1007/978-3-642-15555-0_21

G. Carneiro, A. Chan, P. Moreno, and N. Vasconcelos, Supervised Learning of Semantic Classes for Image Annotation and Retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.29, issue.3, pp.394-410, 2007.
DOI : 10.1109/TPAMI.2007.61

K. Chatfield, V. Lempitsky, A. Vedaldi, and A. Zisserman, The devil is in the details: an evaluation of recent feature encoding methods, Procedings of the British Machine Vision Conference 2011, 2011.
DOI : 10.5244/C.25.76

K. Chatfield, R. Arandjelovi´carandjelovi´c, O. Parkhi, and A. Zisserman, On-the-fly learning for visual search of large-scale image and video datasets, International Journal of Multimedia Information Retrieval, vol.38, issue.2, 2015.
DOI : 10.1007/s13735-015-0077-0

Q. Chen, Z. Song, R. Feris, A. Datta, L. Cao et al., Efficient Maximum Appearance Search for Large-Scale Object Detection, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.410

X. Chen, A. Shrivastava, and A. Gupta, NEIL: Extracting Visual Knowledge from Web Data, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.178

M. Cho, S. Kwak, C. Schmid, and J. Ponce, Unsupervised object discovery and localization in the wild, CVPR, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01110036

M. Choi, J. Lim, A. Torralba, and A. Willsky, Exploiting hierarchical context on a large database of object categories, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540221

S. Chopra, R. Hadsell, and Y. Lecun, Learning a Similarity Metric Discriminatively, with Application to Face Verification, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005.
DOI : 10.1109/CVPR.2005.202

C. Chow and C. Liu, Approximating discrete probability distributions with dependence trees, IEEE Transactions on Information Theory, vol.14, issue.3, pp.462-467, 1968.
DOI : 10.1109/TIT.1968.1054142

O. Chum and A. Zisserman, An Exemplar Model for Learning Object Classes, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383050

M. Cimpoi, S. Maji, and A. Vedaldi, Deep filter banks for texture recognition and segmentation, CVPR, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01263622

R. Cinbis, J. Verbeek, and C. Schmid, Unsupervised metric learning for face identification in TV video, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126415
URL : https://hal.archives-ouvertes.fr/inria-00611682

R. Cinbis, J. Verbeek, and C. Schmid, Image categorization using Fisher kernels of non-iid image models, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6247926
URL : https://hal.archives-ouvertes.fr/hal-00685943

R. Cinbis, J. Verbeek, and C. Schmid, Segmentation Driven Object Detection with Fisher Vectors, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.369
URL : https://hal.archives-ouvertes.fr/hal-00873134

R. Cinbis, J. Verbeek, and C. Schmid, Multi-fold MIL Training for Weakly Supervised Object Localization, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.309
URL : https://hal.archives-ouvertes.fr/hal-00975746

R. Cinbis, J. Verbeek, and C. Schmid, Approximate Fisher Kernels of Non-iid Image Models for Image Categorization, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.38, issue.6, 2016.
DOI : 10.1109/TPAMI.2015.2484342
URL : https://hal.archives-ouvertes.fr/hal-01211201

R. Cinbis, J. Verbeek, and C. Schmid, Weakly Supervised Object Localization with Multi-Fold Multiple Instance Learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, issue.1, 2016.
DOI : 10.1109/TPAMI.2016.2535231
URL : https://hal.archives-ouvertes.fr/hal-01123482

C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, pp.273-297, 1995.
DOI : 10.1007/BF00994018

D. Crandall and D. Huttenlocher, Weakly Supervised Learning of Part-Based Spatial Models for Visual Object Recognition, ECCV, 2006.
DOI : 10.1007/11744023_2

G. Csurka and F. Perronnin, An Efficient Approach to Semantic Segmentation, International Journal of Computer Vision, vol.60, issue.2, pp.198-212, 2011.
DOI : 10.1007/s11263-010-0344-8

G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray, Visual categorization with bags of keypoints, ECCV Int. Workshop on Stat. Learning in Computer Vision, 2004.

N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005.
DOI : 10.1109/CVPR.2005.177
URL : https://hal.archives-ouvertes.fr/inria-00548512

J. Davis, B. Kulis, P. Jain, S. Sra, and I. Dhillon, Information-theoretic metric learning, Proceedings of the 24th international conference on Machine learning, ICML '07, 2007.
DOI : 10.1145/1273496.1273523
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.114.4476

A. Dempster, N. Laird, and D. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society. Series B (Methodological), vol.39, issue.1, pp.1-38, 1977.

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., Imagenet: A large-scale hierarchical image database, CVPR, 2009.

J. Deng, A. Berg, K. Li, and L. Fei-fei, What Does Classifying More Than 10,000 Image Categories Tell Us?, ECCV, 2010.
DOI : 10.1007/978-3-642-15555-0_6

T. Deselaers, B. Alexe, and V. Ferrari, Localizing Objects While Learning Their Appearance, ECCV, 2010.
DOI : 10.1007/978-3-642-15561-1_33

T. Deselaers, B. Alexe, and V. Ferrari, Weakly Supervised Localization and Learning with Generic Knowledge, International Journal of Computer Vision, vol.73, issue.2, pp.257-293, 2012.
DOI : 10.1007/s11263-012-0538-3

T. Dietterich, R. Lathrop, and T. Lozano-pérez, Solving the multiple instance problem with axis-parallel rectangles, Artificial Intelligence, vol.89, issue.1-2, pp.31-71, 1997.
DOI : 10.1016/S0004-3702(96)00034-3

C. Doersch, A. Gupta, and A. Efros, Unsupervised Visual Representation Learning by Context Prediction, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.167

O. Duchenne, I. Laptev, J. Sivic, F. Bach, and J. Ponce, Automatic annotation of human actions in video, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459279

M. Everingham, J. Sivic, and A. Zisserman, Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video, Procedings of the British Machine Vision Conference 2006, 2006.
DOI : 10.5244/C.20.92

M. Everingham, J. Sivic, and A. Zisserman, Taking the bite out of automated naming of characters in TV video, Image and Vision Computing, vol.27, issue.5, pp.545-559, 2009.
DOI : 10.1016/j.imavis.2008.04.018

M. Everingham, L. Van-gool, C. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, pp.303-338, 2010.
DOI : 10.1007/s11263-009-0275-4

P. Felzenszwalb, R. Grishick, D. Mcallester, and D. Ramanan, Object Detection with Discriminatively Trained Part-Based Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.9, 2010.
DOI : 10.1109/TPAMI.2009.167

S. Feng, R. Manmatha, and V. Lavrenko, Multiple Bernoulli relevance models for image and video annotation, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., 2004.
DOI : 10.1109/CVPR.2004.1315274

R. Fergus, P. Perona, and A. Zisserman, A Visual Category Filter for Google Images, ECCV, 2004.
DOI : 10.1007/978-3-540-24670-1_19

R. Fergus, L. Fei-fei, P. Perona, and A. Zisserman, Learning object categories from Google's image search, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, 2005.
DOI : 10.1109/ICCV.2005.142

B. Fernando, E. Gavves, J. Oramas, A. Ghodrati, and T. Tuytelaars, Modeling video evolution for action recognition, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7299176

A. Frome, G. Corrado, J. Shlens, S. Bengio, J. Dean et al., DeViSE: A deep visual-semantic embedding model, NIPS, 2013.

T. Gao and D. Koller, Discriminative learning of relaxed hierarchy for largescale visual recognition, ICCV, 2011.

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.81

A. Globerson and S. Roweis, Metric learning by collapsing classes, NIPS, 2006.

I. Goodfellow, J. Pouget-abadie, M. Mirza, B. Xu, D. Warde-farley et al., Generative adversarial nets, NIPS, 2014.

D. Grangier and S. Bengio, A Discriminative Kernel-Based Approach to Rank Images from Text Queries, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.30, issue.8, pp.1371-1384, 2008.
DOI : 10.1109/TPAMI.2007.70791

K. Gregor, I. Danihelka, A. Graves, and D. Wierstra, Draw: A recurrent neural network for image generation view publication, icml, 2015.

C. Gu, P. Arbeláez, Y. Lin, K. Yu, and M. , Multi-component Models for Object Detection, ECCV, 2012.
DOI : 10.1007/978-3-642-33765-9_32

M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid, Automatic face naming with caption-based supervision, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587603
URL : https://hal.archives-ouvertes.fr/inria-00321048

M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid, TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459266
URL : https://hal.archives-ouvertes.fr/inria-00439276

M. Guillaumin, J. Verbeek, and C. Schmid, Is that you? Metric learning approaches for face identification, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459197
URL : https://hal.archives-ouvertes.fr/inria-00439290

M. Guillaumin, J. Verbeek, and C. Schmid, Multimodal semi-supervised learning for image classification, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540120
URL : https://hal.archives-ouvertes.fr/inria-00548640

M. Guillaumin, J. Verbeek, and C. Schmid, Multiple Instance Metric Learning from Automatically Labeled Bags of Faces, ECCV, 2010.
DOI : 10.1007/978-3-642-15549-9_46
URL : https://hal.archives-ouvertes.fr/inria-00548639

M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid, Face Recognition from Caption-Based Supervision, International Journal of Computer Vision, vol.57, issue.2, pp.64-82, 2012.
DOI : 10.1007/s11263-011-0447-x
URL : https://hal.archives-ouvertes.fr/inria-00522185

J. Hays and A. Efros, IM2GPS: estimating geographic information from a single image, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587784

K. He, X. Zhang, S. Ren, and J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, ECCV, 2014.

G. Hinton, P. Dayan, B. Frey, and R. Neal, The "wake-sleep" algorithm for unsupervised neural networks, Science, vol.268, issue.5214, pp.1158-1161, 1995.
DOI : 10.1126/science.7761831

D. Hoiem, A. Efros, and M. Hebert, Putting Objects in Perspective, International Journal of Computer Vision, vol.57, issue.2, pp.3-15, 2008.
DOI : 10.1007/s11263-008-0137-5

P. Isola, D. Zoran, D. Krishnan, and E. Adelson, Learning visual groups from co-occurrences in space and time, ICLR, 2016.

T. Jaakkola and D. Haussler, Exploiting generative models in discriminative classifiers, NIPS, 1999.

H. Jégou, M. Douze, and C. Schmid, On the burstiness of visual elements, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206609

H. Jégou, F. Perronnin, M. Douze, J. Sánchez, P. Pérez et al., Aggregating Local Image Descriptors into Compact Codes, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.9, pp.1704-1716, 2012.
DOI : 10.1109/TPAMI.2011.235

J. Jeon, V. Lavrenko, and R. Manmatha, Automatic image annotation and retrieval using cross-media relevance models, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval , SIGIR '03, 2003.
DOI : 10.1145/860435.860459

Y. Jiang, J. Liu, A. Zamir, G. Toderici, I. Laptev et al., THUMOS challenge: Action recognition with a large number of classes, 2014.

M. Jordan, Z. Ghahramani, T. Jaakola, and L. Saul, An Introduction to Variational Methods for Graphical Models, Machine Learning, pp.183-233, 1999.
DOI : 10.1007/978-94-011-5014-9_5

A. Joulin, F. Bach, and J. Ponce, Discriminative clustering for image cosegmentation, CVPR, 2010.

A. Joulin, K. Tang, and L. Fei-fei, Efficient Image and Video Co-localization with Frank-Wolfe Algorithm, ECCV, 2014.
DOI : 10.1007/978-3-319-10599-4_17

F. Khan, R. Anwer, J. Van-de-weijer, A. Bagdanov, M. Vanrell et al., Color attributes for object detection, CVPR, 2012.

R. Kiros, R. Salakhutdinov, and R. Zemel, Unifying visual-semantic embeddings with multimodal neural language models, 2015.

T. Kobayashi, Dirichlet-Based Histogram Feature Transform for Image Classification, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.413

P. Kohli, L. L. Ladick´y, and P. Torr, Robust Higher Order Potentials for Enforcing Label Consistency, International Journal of Computer Vision, vol.24, issue.3, pp.302-324, 2009.
DOI : 10.1007/s11263-008-0202-0

M. K. ¨-ostinger, M. Hirzer, P. Wohlhart, P. Roth, and H. Bischof, Large scale metric learning from equivalence constraints, CVPR, 2012.

P. Krähenbkrähenb¨krähenbühl and V. Koltun, Efficient inference in fully connected crfs with gaussian edge potentials, NIPS, 2011.

J. Krapac, M. Allan, J. Verbeek, and F. Jurie, Improving web image search results using query-relative classifiers, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540092
URL : https://hal.archives-ouvertes.fr/inria-00548636

J. Krapac, J. Verbeek, and F. Jurie, Modeling spatial layout with fisher vectors for image categorization, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126406
URL : https://hal.archives-ouvertes.fr/inria-00612277

A. Krizhevsky, I. Sutskever, and G. Hinton, Imagenet classification with deep convolutional neural networks, NIPS, 2012. URL http

A. Kulesza and F. Pereira, Structured learning with approximate inference, NIPS, 2008.

B. Kulis, Metric Learning: A Survey, Machine Learning, pp.287-364, 2012.
DOI : 10.1561/2200000019

P. Kulkarni, J. Zepeda, F. Jurie, P. Pérez, and L. Chevallier, Learning the Structure of Deep Architectures Using L1 Regularization, Procedings of the British Machine Vision Conference 2015, 2015.
DOI : 10.5244/C.29.23
URL : https://hal.archives-ouvertes.fr/hal-01266462

L. L. Ladick´y, P. Sturgess, K. Alahari, C. Russell, and P. Torr, What, where & how many? combining object detectors and crfs, ECCV, 2010.

C. Lampert, M. Blaschko, and T. Hofmann, Efficient Subwindow Search: A Branch and Bound Framework for Object Localization, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, issue.12, pp.31-2129, 2009.
DOI : 10.1109/TPAMI.2009.144

C. Lampert, H. Nickisch, and S. Harmeling, Learning to detect unseen object classes by between-class attribute transfer, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206594

I. Laptev and P. Pérez, Retrieving actions in movies, 2007 IEEE 11th International Conference on Computer Vision, 2007.
DOI : 10.1109/ICCV.2007.4409105

V. Lavrenko, R. Manmatha, and J. Jeon, A model for learning the semantics of pictures, NIPS, 2003.

S. Lazebnik, C. Schmid, and J. Ponce, Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), 2006.
DOI : 10.1109/CVPR.2006.68
URL : https://hal.archives-ouvertes.fr/inria-00548585

Y. Lecun, B. Boser, J. Denker, D. Henderson, R. Howard et al., Handwritten digit recognition with a back-propagation network, NIPS, 1989.

T. Leung and J. Malik, Representing and recognizing the visual appearance of materials using three-dimensional textons, International Journal of Computer Vision, vol.43, issue.1, pp.29-44, 2001.
DOI : 10.1023/A:1011126920638

J. Li and J. Wang, Real-time computerized annotation of pictures, Proceedings of the 14th annual ACM international conference on Multimedia , MULTIMEDIA '06, pp.985-1002, 2008.
DOI : 10.1145/1180639.1180841

L. Li, G. Wang, and L. Fei-fei, OPTIMOL: Automatic object picture collection via incremental model learning, CVPR, 2007.

Z. Li, E. Gavves, K. Van-de-sande, C. Snoek, and A. Smeulders, Codemaps - Segment, Classify and Search Objects Locally, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.454

G. Lin, C. Shen, A. Van-den-hengel, and I. Reid, Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.348

Y. Lin, F. Lv, S. Zhu, M. Yang, T. Cour et al., Largescale image classification: Fast feature extraction and SVM training, CVPR, 2011.

J. Liu, M. Li, Q. Liu, H. Lu, and S. Ma, Image annotation via graph learning, Pattern Recognition, vol.42, issue.2, pp.218-228, 2009.
DOI : 10.1016/j.patcog.2008.04.012

J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7298965

D. Lowe, Object recognition from local scale-invariant features, Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999.
DOI : 10.1109/ICCV.1999.790410

G. Montufar, R. Pascanu, K. Cho, and Y. Bengio, On the number of linear regions of deep neural networks, 2014.

M. Nagel, T. Mensink, and C. Snoek, Event Fisher Vectors: Robust Encoding Visual Diversity of Visual Streams, Procedings of the British Machine Vision Conference 2015, 2015.
DOI : 10.5244/C.29.178

H. Noh, S. Hong, and B. Han, Learning Deconvolution Network for Semantic Segmentation, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.178

S. Nowak and M. Huiskes, New strategies for image annotation: Overview of the photo annotation task at ImageCLEF 2010, Working Notes of CLEF, 2010.

Y. Ohta, T. Kanade, and T. Sakai, An analysis system for scenes containing objects with substructures, ICPR, 1978.

A. Oliva and A. Torralba, Modeling the shape of the scene: a holistic representation of the spatial envelope, International Journal of Computer Vision, vol.42, issue.3, pp.145-175, 2001.
DOI : 10.1023/A:1011139631724

A. Bruno, D. J. Olshausen, and . Field, Sparse coding with an overcomplete basis set: A strategy employed by v1?, Vision Research, vol.37, issue.23, pp.3311-3325, 1997.

D. Oneata, Robust and efficient models for action recognition and localization, 2015.
URL : https://hal.archives-ouvertes.fr/tel-01217362

D. Oneata, J. Verbeek, and C. Schmid, Action and Event Recognition with Fisher Vectors on a Compact Feature Set, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.228
URL : https://hal.archives-ouvertes.fr/hal-00873662

D. Oneata, J. Revaud, J. Verbeek, and C. Schmid, Spatio-temporal Object Detection Proposals, ECCV, 2014.
DOI : 10.1007/978-3-319-10578-9_48
URL : https://hal.archives-ouvertes.fr/hal-01021902

D. Oneata, J. Verbeek, and C. Schmid, Efficient Action Localization with Approximately Normalized Fisher Vectors, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.326
URL : https://hal.archives-ouvertes.fr/hal-00979594

P. Over, G. Awad, M. Michel, J. Fiscus, G. Sanders et al., TRECVID 2012 ? an overview of the goals, tasks, data, evaluation mechanisms and metrics, Proceedings of TRECVID, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00953826

J. Pan, H. Yang, C. Faloutsos, and P. Duygulu, Automatic multimedia crossmodal correlation discovery, ACM SIGKDD, 2004.

M. Pandey and S. Lazebnik, Scene recognition and weakly supervised object localization with deformable part-based models, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126383

G. Papandreou, L. Chen, K. Murphy, and A. Yuille, Weakly-and semisupervised learning of a deep convolutional network for semantic image segmentation, ICCV, 2015.

D. Pathak, P. Krahenbühlkrahenb¨krahenbühl, and T. Darrell, Constrained Convolutional Neural Networks for Weakly Supervised Segmentation, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.209

J. Pearl, Reverend Bayes on inference engines: A distributed hierarchical approach, Proceedings of the Second National Conference on Artificial Intelligence, 1982.

X. Peng, C. Zou, Y. Qiao, and Q. Peng, Action Recognition with Stacked Fisher Vectors, ECCV, 2014.
DOI : 10.1007/978-3-319-10602-1_38

A. Perina, M. Cristani, U. Castellani, V. Murino, and N. Jojic, A hybrid generative/discriminative classification framework based on free-energy terms, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459453

A. Perina, M. Cristani, U. Castellani, V. Murino, and N. Jojic, Free energy score space, NIPS, 2009.

F. Perronnin and C. Dance, Fisher Kernels on Visual Vocabularies for Image Categorization, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383266

F. Perronnin and D. Larlus, Fisher vectors meet Neural Networks: A hybrid classification architecture, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7298998

F. Perronnin, J. Sánchez, and Y. Liu, Large-scale image categorization with explicit data embedding, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5539914

F. Perronnin, J. Sánchez, and T. Mensink, Improving the Fisher Kernel for Large-Scale Image Classification, ECCV, 2010.
DOI : 10.1007/978-3-642-15561-1_11
URL : https://hal.archives-ouvertes.fr/inria-00548630

F. Perronnin, Z. Akata, Z. Harchaoui, and C. Schmid, Towards good practice in large-scale learning for image classification, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6248090
URL : https://hal.archives-ouvertes.fr/hal-00690014

P. Pinheiro and R. Collobert, Recurrent convolutional neural networks for scene labeling, ICML, 2014.

A. Prest, C. Leistner, J. Civera, C. Schmid, and V. Ferrari, Learning object class detectors from weakly annotated video, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6248065
URL : https://hal.archives-ouvertes.fr/hal-00695940

A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and S. Belongie, Objects in Context, 2007 IEEE 11th International Conference on Computer Vision, 2007.
DOI : 10.1109/ICCV.2007.4408986

S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence
DOI : 10.1109/TPAMI.2016.2577031

O. Ronneberger, P. Fischer, and T. Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation, Medical Image Computing and Computer-Assisted Intervention, 2015.
DOI : 10.1007/978-3-319-24574-4_28

O. Russakovsky, Y. Lin, K. Yu, and L. Fei-fei, Object-Centric Spatial Pooling for Image Classification, ECCV, 2012.
DOI : 10.1007/978-3-642-33709-3_1

R. Salakhutdinov, J. Tenenbaum, and A. Torralba, One-shot learning with a hierarchical nonparametric bayesian model, ICML Unsupervised and Transfer Learning workshop, 2012.

J. Sánchez and F. Perronnin, High-dimensional signature compression for large-scale image classification, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995504

J. Sánchez and J. Redolfi, Exponential family Fisher vector for image classification, Pattern Recognition Letters, vol.59, pp.26-32, 2015.
DOI : 10.1016/j.patrec.2015.03.010

J. Sánchez, F. Perronnin, and T. De-campos, Modeling the spatial layout of images beyond spatial pyramids, Pattern Recognition Letters, vol.33, issue.16, pp.2216-2223, 2012.
DOI : 10.1016/j.patrec.2012.07.019

J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek, Image Classification with the Fisher Vector: Theory and Practice, International Journal of Computer Vision, vol.73, issue.2, pp.222-245, 2013.
DOI : 10.1007/s11263-013-0636-x

S. Satoh, Y. Nakamura, and T. Kanade, Name-It: naming and detecting faces in news videos, IEEE Multimedia, vol.6, issue.1, pp.22-35, 1999.
DOI : 10.1109/93.752960

S. Saxena and J. Verbeek, Coordinated Local Metric Learning, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW), 2015.
DOI : 10.1109/ICCVW.2015.56
URL : https://hal.archives-ouvertes.fr/hal-01215272

C. Schmid and R. Mohr, Local grayvalue invariants for image retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.19, issue.5, pp.530-534, 1997.
DOI : 10.1109/34.589215
URL : https://hal.archives-ouvertes.fr/inria-00548358

F. Schroff, D. Kalenichenko, and J. Philbin, FaceNet: A unified embedding for face recognition and clustering, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7298682

A. Schwing and R. Urtasun, Fully connected deep structured networks, Arxiv preprint, 2015.

B. Settles, Active learning literature survey, 2009.

Z. Shi, T. Hospedales, and T. Xiang, Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.371

J. Shotton, J. Winn, C. Rother, and A. Criminisi, TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation, ECCV, pp.1-15, 2006.
DOI : 10.1007/11744023_1

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2015.

K. Simonyan, O. Parkhi, A. Vedaldi, and A. Zisserman, Fisher Vector Faces in the Wild, Procedings of the British Machine Vision Conference 2013, 2013.
DOI : 10.5244/C.27.8

P. Siva, C. Russell, and T. Xiang, In Defence of Negative Mining for Annotating Weakly Labelled Data, ECCV, 2012.
DOI : 10.1007/978-3-642-33712-3_43

P. Siva and T. Xiang, Weakly supervised object detector learning with model drift detection, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126261

J. Sivic and A. Zisserman, Video Google: a text retrieval approach to object matching in videos, Proceedings Ninth IEEE International Conference on Computer Vision, 2003.
DOI : 10.1109/ICCV.2003.1238663

J. Sivic, M. Everingham, and A. Zisserman, Who are you? " : Learning person specific classifiers from video, CVPR, 2009.

H. Song, R. Girshick, S. Jegelka, J. Mairal, Z. Harchaoui et al., On learning to localize objects with minimal supervision, ICML, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00996849

H. Song, Y. Lee, S. Jegelka, and T. Darrell, Weakly-supervised discovery of visual pattern configurations, NIPS, 2014.

C. Sun and R. Nevatia, ACTIVE: Activity Concept Transitions in Video Event Classification, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.453

J. Tighe and S. Lazebnik, SuperParsing: Scalable Nonparametric Image Parsing with Superpixels, IJCV, vol.101, issue.2, pp.329-349, 2013.
DOI : 10.1007/978-3-642-15555-0_26

J. Uijlings, K. Van-de-sande, T. Gevers, and A. Smeulders, Selective Search for Object Recognition, International Journal of Computer Vision, vol.57, issue.1, pp.154-171, 2013.
DOI : 10.1007/s11263-013-0620-5

K. Van-de-sande, C. Snoek, and A. Smeulders, Fisher and VLAD with FLAIR, CVPR, 2014.

J. Van-de-weijer and C. Schmid, Coloring Local Feature Extraction, ECCV, 2006.
DOI : 10.1002/col.10049
URL : https://hal.archives-ouvertes.fr/inria-00548576

A. Vedaldi and A. Zisserman, Efficient additive kernels via explicit feature maps, CVPR, 2010.

J. Verbeek and B. Triggs, Region Classification with Markov Field Aspect Models, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383098
URL : https://hal.archives-ouvertes.fr/inria-00321129

J. Verbeek and B. Triggs, Scene segmentation with CRFs learned from partially labeled images, NIPS, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00321051

S. Vijayanarasimhan and K. Grauman, Large-scale live active learning: Training object detectors with crawled data and crowds, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995430

V. Vineet, J. Warrell, and P. Torr, Filter-based mean-field inference for random fields with higher-order terms and product label-spaces, 2014.

P. Viola and M. Jones, Robust Real-Time Face Detection, International Journal of Computer Vision, vol.57, issue.2, pp.137-154, 2004.
DOI : 10.1023/B:VISI.0000013087.49260.fb

C. Wang, W. Ren, K. Huang, and T. Tan, Weakly Supervised Object Localization with Latent Category Learning, ECCV, 2014.
DOI : 10.1007/978-3-319-10599-4_28

H. Wang and C. Schmid, Action Recognition with Improved Trajectories, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.441
URL : https://hal.archives-ouvertes.fr/hal-00873267

H. Wang, A. Kläser, C. Schmid, and C. Liu, Dense Trajectories and Motion Boundary Descriptors for Action Recognition, International Journal of Computer Vision, vol.73, issue.2, pp.60-79, 2013.
DOI : 10.1007/s11263-012-0594-8
URL : https://hal.archives-ouvertes.fr/hal-00725627

H. Wang, D. Oneata, J. Verbeek, and C. Schmid, A Robust and Efficient Video Representation for Action Recognition, International Journal of Computer Vision, vol.103, issue.1, p.2015
DOI : 10.1007/s11263-015-0846-5
URL : https://hal.archives-ouvertes.fr/hal-01145834

J. Wang, K. Sun, F. Sha, S. Marchand-maillet, and A. Kalousis, Two-stage metric learning, ICML, 2014.

X. Wang and A. Gupta, Unsupervised Learning of Visual Representations Using Videos, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.320

A. Webb, Statistical pattern recognition, 2002.

K. Weinberger and L. Saul, Distance metric learning for large margin nearest neighbor classification, JMLR, vol.10, pp.207-244, 2009.

K. Weinberger, J. Blitzer, and L. Saul, Distance metric learning for large margin nearest neighbor classification, NIPS, 2006.

J. Weston, S. Bengio, and N. Usunier, WSABIE: Scaling up to large vocabulary image annotation, IJCAI, 2011.

J. Winn, A. Criminisi, and T. Minka, Object categorization by learned universal visual dictionary, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, 2005.
DOI : 10.1109/ICCV.2005.171

A. Yavlinsky, E. Schofield, and S. , Automated Image Annotation Using Global Features and Robust Nonparametric Density Estimation, CIVR, 2005. URL www.edschofield.com/publications
DOI : 10.1007/11526346_54

J. Yedidia, W. Freeman, and Y. Weiss, Understanding belief propagation and its generalizations, 2002.

J. Yuan, Z. Liu, and Y. Wu, Discriminative subvolume search for efficient action detection, CVPR, 2009.

H. Zhang, A. Berg, M. Maire, and J. Malik, SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), pp.2126-2136, 2006.
DOI : 10.1109/CVPR.2006.301

J. Zhang, M. Marsza?ek, S. Lazebnik, and C. Schmid, Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study, International Journal of Computer Vision, vol.36, issue.1, pp.213-238, 2007.
DOI : 10.1007/s11263-006-9794-4
URL : https://hal.archives-ouvertes.fr/inria-00548574

C. Zitnick and P. Dollár, Edge Boxes: Locating Object Proposals from Edges, ECCV, 2014.
DOI : 10.1007/978-3-319-10602-1_26

W. Zou, S. Zhu, A. Ng, and K. Yu, Deep learning of invariant features via simulated fixations in video, NIPS, 2012. Appendix A Curriculum vitae 81

. Avenue-de-l-'europe, 38330 Montbonnot, France Email: Jakob.Verbeek@inria.fr Webpage: http://thoth.inrialpes.fr/?verbeek Citizenship: Dutch, Date of birth, 1975.

C. Vitae, ?. Jakob-verbeek-academic-prof, . Dr, . F. Ir, D. Groen et al., Thesis: Mixture models for clustering and dimension reduction Thesis: An information theoretic approach to finding word groups for text classification Thesis: Overfitting using the minimum description length principle. Awards 2011 ? Outstanding Reviewer Award Professional Activities Participation in Research Projects 2016-2018 ? Structured prediction for weakly supervised semantic segmentation, funded by Facebook Artificial Intelligence Research (FAIR) Paris and French national research and technology agency (ANRT). 2015-2016 ? Incremental learning for object category localization, Informatics Institute Dutch National Research Institute for Mathematics and Computer Science & University of Amsterdam. Advisors: Prof. Dr. P. Vitányi, Dr. P. GrünwaldGr¨Grünwald, and Dr. R. de Wolf ? Outstanding Reviewer Award, IEEE Conference on Computer Vision and Pattern Recognition ? Researcher (CR1), INRIA RhôneRh?Rhône-Alpes-2005 ? Postdoc, Intelligent Autonomous Systems group, Informatics Institute MBDA Systems. 2013-2016 ? Physionomie: Physiognomic Recognition for Forensic Investigation , funded by French national research agency (ANR). 2011-2015 ? AXES: Access to Audiovisual Archives, European integrated project, 7th Framework Programme. 2010-2013 ? Quaero Consortium for Multimodal Person Recognition, funded by French national research agency (ANR). 2009-2012 ? Modeling multi-media documents for cross-media access, funded by Xerox Research Centre Europe (XRCE) and French national research and technology agency (ANRT). 2008-2010 ? Interactive Image Search, funded by French national research agency (ANR). 2006-2009 ? Cognitive-Level Annotation using Latent Statistical Structure (CLASS), funded by European Union Sixth Framework Programme. 2000-2005 ? Tools for Non-linear Data Analysis, funded by Dutch Technology Foundation (STW), 1998.

@. Veni, Netherlands Organisation for Scientific Research (NWO) Miscellaneous Research Visits 2011 ? Visiting researcher Statistical Machine Learning group, Miscellaneous (continued) Summer Schools & Workshops 2015 ? DGA workshop on Big Data in Multimedia Information Processing, 2003.

. @bullet-statlearn-workshop, 2014 ? 3rd Croatian Computer Vision Workshop, Center of Excellence for Computer Vision, ? 2nd IST Workshop on Computer Vision and Machine Learning, 2011.

@. Texmex-team, . Inria, and F. Rennes, Image categorization using Fisher kernels of non-iid image models Modelling spatial layout for image classification, ? Statistical Machine Learning group, 2011.

L. @bullet-parole-group, @. G. Nancy, J. Cinbis, C. Verbeek, and . Schmid, ? Content Analysis group, Xerox Research Centre Europe, Manifold learning: unsupervised, correspondences, and semi-supervised. 2005 ? Learning and Recognition in Vision group, INRIA RhôneRh?Rhône-Alpes, Manifold learning & image segmentation Manifold learning with local linear models and Gaussian fields. 2004 ? Algorithms and Complexity group, Dutch Center for Mathematics and Computer Science, Semi-supervised dimension reduction through smoothing on graphs Spectral methods for dimension reduction and nonlinear CCA A generative model for the Self-Organizing Map. Publications In peer reviewed international journals Weakly Supervised Object Localization with Multi-fold Multiple Instance Learning, ? Information and Language Processing Systems group ? G. Cinbis, J. Verbeek, C. Schmid. Approximate Fisher kernels of non-iid image models for image categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002.

@. H. Wang, D. Oneat¸?oneat¸?-a, J. Verbeek, and C. Schmid, A Robust and Efficient Video Representation for Action Recognition, International Journal of Computer Vision, vol.103, issue.1, 2015.
DOI : 10.1007/s11263-015-0846-5
URL : https://hal.archives-ouvertes.fr/hal-01145834

@. M. Douze, J. Revaud, J. Verbeek, H. Jégou, and C. Schmid, Circulant Temporal Encoding for Video Retrieval and Temporal Alignment, International Journal of Computer Vision, vol.33, issue.4, 2013.
DOI : 10.1007/s11263-015-0875-0
URL : https://hal.archives-ouvertes.fr/hal-01162603

@. J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek, Image Classification with the Fisher Vector: Theory and Practice, International Journal of Computer Vision, vol.73, issue.2, pp.222-245, 2013.
DOI : 10.1007/s11263-013-0636-x

@. T. Mensink, J. Verbeek, F. Perronnin, and G. Csurka, Distance-Based Image Classification: Generalizing to New Classes at Near-Zero Cost, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.11, pp.2624-2637, 2013.
DOI : 10.1109/TPAMI.2013.83
URL : https://hal.archives-ouvertes.fr/hal-00817211

@. T. Mensink, J. Verbeek, G. Csurka, @. M. Guillaumin, T. Mensink et al., Tree-Structured CRF Models for Interactive Image Labeling, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.2, pp.476-489, 2010.
DOI : 10.1109/TPAMI.2012.100
URL : https://hal.archives-ouvertes.fr/hal-00688143

@. D. Larlus, J. Verbeek, and F. Jurie, Category Level Object Segmentation by Combining Bag-of-Words Models with Dirichlet Processes and Random Fields, International Journal of Computer Vision, vol.77, issue.1???3, pp.238-253, 2009.
DOI : 10.1007/s11263-009-0245-x
URL : https://hal.archives-ouvertes.fr/inria-00439303

@. J. Van-de-weijer, C. Schmid, J. Verbeek, D. Larlus, @. J. Verbeek et al., Learning Color Names for Real-World Applications, IEEE Transactions on Image Processing, vol.18, issue.7, pp.1512-1523, 2006.
DOI : 10.1109/TIP.2009.2019809
URL : https://hal.archives-ouvertes.fr/inria-00439284

@. J. Verbeek and N. Vlassis, Gaussian fields for semi-supervised regression and correspondence learning, Pattern Recognition, vol.39, issue.10, pp.1864-1875, 2006.
DOI : 10.1016/j.patcog.2006.04.011
URL : https://hal.archives-ouvertes.fr/inria-00321133

@. J. Verbeek, Learning nonlinear image manifolds by global alignment of local linear models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.28, issue.8, pp.1236-1250, 2005.
DOI : 10.1109/TPAMI.2006.166
URL : https://hal.archives-ouvertes.fr/inria-00321131

@. J. Porta, J. Verbeek, and B. Krösekr¨kröse, Active Appearance-Based Robot Localization Using Stereo Vision, Autonomous Robots, vol.18, issue.1, pp.59-80, 2005.
DOI : 10.1023/B:AURO.0000047287.00119.b6
URL : https://hal.archives-ouvertes.fr/inria-00321476

@. J. Verbeek, N. Vlassis, and B. Krösekr¨kröse, Self-organizing mixture models, Neurocomputing, vol.63, pp.99-123, 2003.
DOI : 10.1016/j.neucom.2004.04.008
URL : https://hal.archives-ouvertes.fr/inria-00321479

@. J. Verbeek, N. Vlassis, and B. Krösekr¨kröse, Efficient Greedy Learning of Gaussian Mixture Models, Neural Computation, vol.35, issue.1, pp.469-485, 2003.
DOI : 10.1214/aos/1176344374
URL : https://hal.archives-ouvertes.fr/inria-00321487

@. A. Likas, N. Vlassis, and J. Verbeek, The global k-means clustering algorithm, Pattern Recognition, vol.36, issue.2, pp.451-461, 2002.
DOI : 10.1016/S0031-3203(02)00060-2
URL : https://hal.archives-ouvertes.fr/inria-00321493

@. J. Verbeek, N. Vlassis, and B. Krösekr¨kröse, A k-segments algorithm for finding principal curves, Pattern Recognition Letters, vol.23, issue.8, pp.1009-1017, 2002.
DOI : 10.1016/S0167-8655(02)00032-6
URL : https://hal.archives-ouvertes.fr/inria-00321497

@. D. Oneat¸?oneat¸?-a, J. Verbeek, C. Schmid, @. G. Cinbis, J. Verbeek et al., Efficient Action Localization with Approximately Normalized Fisher Vectors Segmentation Driven Object Detection with Fisher Vectors, Proceedings IEEE Conference on Computer Vision and Pattern Recognition Proceedings IEEE International Conference on Computer Vision, 2013.

@. D. Oneat¸?oneat¸?-a, J. Verbeek, C. Schmid, @. T. Mensink, J. Verbeek et al., Action and Event Recognition with Fisher Vectors on a Compact Feature Set Metric learning for large scale image classification: generalizing to new classes at near-zero cost, Proceedings IEEE International Conference on Computer Vision Proceedings European Conference on Computer Vision, 2012.

@. G. Cinbis, J. Verbeek, and C. Schmid, Image categorization using Fisher kernels of non-iid image models, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2011.
DOI : 10.1109/CVPR.2012.6247926
URL : https://hal.archives-ouvertes.fr/hal-00685943

@. J. Krapac, J. Verbeek, and F. Jurie, Modeling spatial layout with fisher vectors for image categorization, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126406
URL : https://hal.archives-ouvertes.fr/inria-00612277

@. G. Cinbis, J. Verbeek, and C. Schmid, Unsupervised metric learning for face identification in TV video, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126415
URL : https://hal.archives-ouvertes.fr/inria-00611682

@. J. Krapac, J. Verbeek, and F. Jurie, Learning tree-structured descriptor quantizers for image categorization, Proceedings British Machine Vision Conference, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00613118

@. T. Mensink, J. Verbeek, G. Csurka, @. M. Guillaumin, J. Verbeek et al., Learning structured prediction models for interactive image labeling Multiple instance metric learning from automatically labeled bags of faces, Proceedings IEEE Conference on Computer Vision and Pattern Recognition Proceedings European Conference on Computer Vision, 2010.

@. M. Guillaumin, J. Verbeek, and C. Schmid, Multimodal semi-supervised learning for image classication, Proceedings IEEE Conference on Computer Vision and Pattern Recognition, 2010.

@. J. Krapac, M. Allan, J. Verbeek, and F. Jurie, Improving web image search results using query-relative classifiers, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540092
URL : https://hal.archives-ouvertes.fr/inria-00548636

@. T. Mensink, J. Verbeek, and G. Csurka, Trans Media Relevance Feedback for Image Autoannotation, Procedings of the British Machine Vision Conference 2010, 2010.
DOI : 10.5244/C.24.20
URL : https://hal.archives-ouvertes.fr/inria-00548632

@. T. Mensink, J. Verbeek, and H. Kappen, EP for efficient stochastic control with obstacles, Proceedings European Conference on Artificial Intelligence, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00548631

@. J. Verbeek, M. Guillaumin, T. Mensink, and C. Schmid, Image annotation with tagprop on the MIRFLICKR set, Proceedings of the international conference on Multimedia information retrieval, MIR '10, 2009.
DOI : 10.1145/1743384.1743476
URL : https://hal.archives-ouvertes.fr/inria-00548628

@. M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid, TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459266
URL : https://hal.archives-ouvertes.fr/inria-00439276

@. M. Guillaumin, J. Verbeek, and C. Schmid, Is that you? Metric learning approaches for face identification, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459197
URL : https://hal.archives-ouvertes.fr/inria-00439290

@. M. Allan, J. M. @bullet, T. Guillaumin, J. Mensink, C. Verbeek et al., Verbeek Ranking user-annotated images for multiple query terms Automatic face naming with caption-based supervision, Proceedings British Machine Vision Conference Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2008.

@. T. Mensink and J. Verbeek, Improving People Search Using Query Expansions, Proceedings European Conference on Computer Vision, pp.86-99, 2008.
DOI : 10.1007/978-3-540-88688-4_7
URL : https://hal.archives-ouvertes.fr/inria-00321045

@. J. Verbeek and B. Triggs, Scene segmentation with CRFs learned from partially labeled images, Advances in Neural Information Processing Systems, pp.1553-1560, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00321051

@. H. Cevikalp, J. Verbeek, F. Jurie, A. Kläser, @. J. Van-de-weijer et al., Semi-supervised dimensionality reduction using pairwise equivalence constraints Learning color names from real-world images, Proceedings International Conference on Computer Vision Theory and Applications Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp.489-496, 2007.

@. J. Verbeek and B. Triggs, Region Classification with Markov Field Aspect Models, 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2007.
DOI : 10.1109/CVPR.2007.383098
URL : https://hal.archives-ouvertes.fr/inria-00321129

@. J. Van-de-weijer, C. Schmid, J. Verbeek, @. Z. Zivkovic, J. Verbeek et al., Using high-level visual information for color constancy Transformation invariant component analysis for binary images Non-linear CCA and PCA by alignment of local models, Proceedings IEEE International Conference on Computer Vision Proceedings IEEE Conference on Computer Vision and Pattern Recognition Advances in Neural Information Processing Systems 16, pp.1-8, 2003.

@. J. Porta, J. Verbeek, and B. Krösekr¨kröse, Enhancing appearance-based robot localization using non-dense disparity maps, Proceedings International Conference on Intelligent Robots and Systems, pp.980-985, 2003.

@. J. Verbeek, N. Vlassis, and B. Krösekr¨kröse, Self-organization by optimizing free-energy, Proceedings 11th European Symposium on Artificial Neural Networks, pp.125-130, 2002.
URL : https://hal.archives-ouvertes.fr/inria-00321491

@. J. Verbeek, N. Vlassis, and B. Krösekr¨kröse, Coordinating Principal Component Analyzers, Proceedings International Conference on Artificial Neural Networks, pp.914-919, 2002.
DOI : 10.1007/3-540-46084-5_148
URL : https://hal.archives-ouvertes.fr/inria-00321498

@. J. Verbeek, N. Vlassis, and B. Krösekr¨kröse, Fast nonlinear dimensionality reduction with topology preserving networks, Proceedings 10th European Symposium on Artificial Neural Networks, pp.193-198, 2001.
URL : https://hal.archives-ouvertes.fr/inria-00321500

@. J. Verbeek, N. Vlassis, and B. Krösekr¨kröse, A Soft k-Segments Algorithm for Principal Curves, Proceedings International Conference on Artificial Neural Networks, pp.450-456, 2001.
DOI : 10.1007/3-540-44668-0_63
URL : https://hal.archives-ouvertes.fr/inria-00321506

@. T. Book-chapters-2013, J. Mensink, F. Verbeek, G. Perronnin, G. Csurka et al., Large scale metric learning for distance-based image classification on open ended data sets Advances in Computer Vision and Pattern Recognition, Color in Computer Vision, Wiley, 2012. Workshops and regional conferences 2015 ? S. Saxena, and J. Verbeek. Coordinated Local Metric Learning. ICCV ChaLearn Looking at People workshop, 2012.

@. V. Zadrija, J. Krapac, J. Verbeek, S. Segvi´csegvi´c, @. M. Douze et al., Patch-level Spatial Layout for Classification and Weakly Supervised Localization German Conference on Pattern Recognition The INRIA-LIM-VocR and AXES submissions to Trecvid 2014 Multimedia Event Detection, 2013.

@. H. Bredin, J. Poignant, G. Fortier, M. Tapaswi, V. Le et al., QCompere @ REPERE 2013 Workshop on Speech, Language and Audio for Multimedia, Parkhi, and R. Arandjelovic, A. Zisserman, F. Basura, and T. Tuytelaars. AXES at TRECVid 2012: KIS, INS, and MED. TRECVID Workshop, 2012.

@. H. Bredin, J. Poignant, M. Tapaswi, G. Fortier, V. Bac-le et al., Fusion of Speech, Faces and Text for Person Identification in TV Broadcast, Learning to Rank and Quadratic Assignment. NIPS Workshop on Discrete Optimization in Machine Learning ? T. Mensink, G. Csurka, F. Perronnin, J. Sánchez, and J. Verbeek. LEAR and XRCEs participation to Visual Concept Detection Task -ImageCLEF 2010. Working Notes for the CLEF 2010 Workshop, 2010.
DOI : 10.1007/978-3-642-33885-4_39
URL : https://hal.archives-ouvertes.fr/hal-00722884

@. M. Guillaumin, T. Mensink, J. Verbeek, C. Schmid, @. M. Douze et al., Apprentissage de distance pour l'annotation d'images par plus proches voisins. Reconnaissance des Formes et Intelligence Artificielle INRIA-LEARs participation to ImageCLEF Working Notes for the CLEF, ? J. Nunnink, J. Verbeek, and N. Vlassis. Accelerated greedy mixture learning. Proceedings Annual Machine Learning Conference of Belgium and the Netherlands, pp.80-86, 2003.
URL : https://hal.archives-ouvertes.fr/inria-00439309

@. J. Verbeek, N. Vlassis, and J. Nunnink, A variational EM algorithm for large-scale mixture modeling, Proceedings Conference of the Advanced School for Computing and Imaging, pp.136-143, 2003.
URL : https://hal.archives-ouvertes.fr/inria-00321486

@. J. Verbeek, N. Vlassis, and B. Krösekr¨kröse, Non-linear feature extraction by the coordination of mixture models, Proceedings Conference of the Advanced School for Computing and Imaging, pp.287-293, 2002.
URL : https://hal.archives-ouvertes.fr/inria-00321490

@. J. Verbeek, N. Vlassis, and B. Krösekr¨kröse, Locally linear generative topographic mapping, Proceedings Annual Machine Learning Conference of Belgium and the Netherlands, pp.79-86, 2001.
URL : https://hal.archives-ouvertes.fr/inria-00321501

@. J. Verbeek, N. Vlassis, and B. Krösekr¨kröse, Efficient Greedy Learning of Gaussian Mixture Models, Proceedings 13th Belgian- Dutch Conference on Artificial Intelligence, pp.251-258, 2001.
DOI : 10.1214/aos/1176344374
URL : https://hal.archives-ouvertes.fr/inria-00321487

@. J. Verbeek, N. Vlassis, and B. Krösekr¨kröse, Greedy Gaussian mixture learning for texture segmentation. (oral) ICANN Workshop on Kernel and Subspace Methods for Computer Vision, Publications, pp.37-46, 2000.
URL : https://hal.archives-ouvertes.fr/inria-00321513

@. J. Verbeek and @. J. Verbeek, Supervised feature extraction for text categorization Using a sample-dependent coding scheme for two-part MDL, Proceedings Annual Machine Learning Conference of Belgium and the Netherlands Proceedings Machine Learning & Applications (ACAI '99), 1999.

@. T. Mensink, J. Verbeek, G. Csurka, F. Perronnin, @. T. Mensink et al., Metric Learning for Nearest Class Mean Classifiers United States Patent Application 20140029839, Publication date: 01/30/2014, filing date: 07/30/2012, XEROX Corporation Learning Structured prediction models for interactive image labeling. United States Patent Application 20120269436, Publication date: 25, XEROX Corporation Retrieval systems and methods employing probabilistic cross-media relevance feedback, p.31, 2010.

@. J. Sanchez, F. Perronnin, T. Mensink, J. Verbeek, @. T. Mensink et al., Image classification with the Fisher vector: theory and practice Large scale metric learning for distance-based image classification Region-based image classification with a latent SVM model, 2011.

@. J. Krapac, J. Verbeek, and F. Jurie, Spatial Fisher vectors for image categorization, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00613572

@. T. Mensink, J. Verbeek, G. Csurka, @. M. Guillaumin, T. Mensink et al., Weighted transmedia relevance feedback for image retrieval and autoannotation Face recognition from caption-based supervision Category level object segmentation by combining bag-of-words models and Markov random fields, ? J. Verbeek, and N. Vlassis. Semi-supervised learning with Gaussian fields, 2005.

@. J. Verbeek, Rodent behavior annotation from video, ? J. Verbeek, and N. Vlassis. Gaussian mixture learning from noisy data, 2002.
URL : https://hal.archives-ouvertes.fr/inria-00548500

@. J. Verbeek, N. Vlassis, and B. Krösekr¨kröse, The generative self-organizing map: a probabilistic generalization of Kohonen's SOM, 2002.

@. J. Verbeek, N. Vlassis, B. Krösekr¨kröse, @. A. Likas, N. Vlassis et al., Procrustes analysis to coordinate mixtures of probabilistic principal component analyzers The global k-means clustering algorithm, 2001.

@. J. Verbeek, N. Vlassis, and B. Krösekr¨kröse, Efficient Greedy Learning of Gaussian Mixture Models, Neural Computation, vol.35, issue.1, 2000.
DOI : 10.1214/aos/1176344374
URL : https://hal.archives-ouvertes.fr/inria-00321487

@. J. Verbeek, N. Vlassis, and B. Krösekr¨kröse, A k-segments algorithm for finding principal curves, Pattern Recognition Letters, vol.23, issue.8, 2000.
DOI : 10.1016/S0167-8655(02)00032-6
URL : https://hal.archives-ouvertes.fr/inria-00321497