S. Andrews, I. Tsochantaridis, and T. Hofmann, Support vector machines for multiple-instance learning, Advances in Neural Information Processing Systems, 2003.

D. Blei and M. Jordan, Modeling annotated data, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval , SIGIR '03, 2003.
DOI : 10.1145/860435.860460
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.140.6686

G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray, Visual categorization with bags of keypoints, ECCV Int. Workshop on Stat. Learning in Computer Vision, 2004.

A. Dempster, N. Laird, and D. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, vol.39, issue.1, pp.1-38, 1977.

T. Dietterich, R. Lathrop, and T. Lozano-pérez, Solving the multiple instance problem with axis-parallel rectangles, Artificial Intelligence, vol.89, issue.1-2, pp.31-71, 1997.
DOI : 10.1016/S0004-3702(96)00034-3

P. Duygulu, K. Barnard, N. De-freitas, and D. Forsyth, Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary, European Conference on Computer Vision, 2002.
DOI : 10.1007/3-540-47979-1_7

M. Everingham, L. Van-gool, C. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, 2007.
DOI : 10.1007/s11263-009-0275-4
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.167.6629

R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin, LIBLINEAR: A library for large linear classification, Journal of Machine Learning Research, pp.1871-1874, 2008.

P. Felzenszwalb, R. Girshick, D. Mcallester, and D. Ramanan, Object Detection with Discriminatively Trained Part-Based Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.9, pp.1627-1645, 2010.
DOI : 10.1109/TPAMI.2009.167

B. Fulkerson, A. Vedaldi, and S. Soatto, Class segmentation and object localization with superpixel neighborhoods, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459175
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.150.4613

J. Kelley, The Cutting-Plane Method for Solving Convex Programs, Journal of the Society for Industrial and Applied Mathematics, vol.8, issue.4, pp.703-712, 1960.
DOI : 10.1137/0108053

F. Shahbaz-khan, J. Van-de-weijer, and M. Vanrell, Top-down color attention for object recognition, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459362

V. Kolmogorov and R. Zabih, What energy functions can be minimized via graph cuts? IEEE Pattern Analysis and Machine Intelligence, 2004.
DOI : 10.1109/tpami.2004.1262177
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.113.1823

C. Lampert, M. Blaschko, and T. Hofmann, Efficient Subwindow Search: A Branch and Bound Framework for Object Localization, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, issue.12, pp.312129-2142, 2009.
DOI : 10.1109/TPAMI.2009.144

S. Lazebnik, C. Schmid, and J. Ponce, Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), 2006.
DOI : 10.1109/CVPR.2006.68
URL : https://hal.archives-ouvertes.fr/inria-00548585

D. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, vol.60, issue.2, pp.91-110, 2004.
DOI : 10.1023/B:VISI.0000029664.99615.94
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.4931

M. Nguyen, L. Torresani, F. De-la-torre, and C. Rother, Weakly supervised discriminative localization and classification: a joint learning process, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459426

F. Perronnin and C. Dance, Fisher Kernels on Visual Vocabularies for Image Categorization, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383266

F. Perronnin, J. Sanchez, and T. Mensink, Improving the Fisher kernel for largescale image classification, European Conference on Computer Vision, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00548630

X. Ren and J. Malik, Learning a classification model for segmentation, Proceedings Ninth IEEE International Conference on Computer Vision, 2003.
DOI : 10.1109/ICCV.2003.1238308

B. Russell, W. Freeman, A. Efros, J. Sivic, and A. Zisserman, Using Multiple Segmentations to Discover Objects and their Extent in Image Collections, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), 2006.
DOI : 10.1109/CVPR.2006.326

P. Schnitzspan, S. Roth, and B. Schiele, Automatic discovery of meaningful object parts with latent CRFs, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540220

J. Shi and J. Malik, Normalized cuts and image segmentation, International Conference on Computer Vision and Pattern Recognition, 1997.

I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, Large margin methods for structured and interdependent output variables, Journal of Machine Learning Research, vol.6, pp.1453-1484, 2005.

V. Vapnik, The Nature of Statistical Learning Theory, 1995.

Z. Zha, X. Hua, T. Mei, J. Wang, G. Qi et al., Joint multi-label multi-instance learning for image classification, International Conference on Computer Vision and Pattern Recognition, 2008.

M. Zhang and Z. Zhou, Multi-instance multi-label learning with application to scene classification, Advances in Neural Information Processing Systems, 2006.

M. Zhang and Z. Zhou, M3MIML: A Maximum Margin Method for Multi-instance Multi-label Learning, 2008 Eighth IEEE International Conference on Data Mining, 2008.
DOI : 10.1109/ICDM.2008.27