B. Alexe, T. Deselaers, and V. Ferrari, Measuring the Objectness of Image Windows, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.11, 2012.
DOI : 10.1109/TPAMI.2012.28

K. M. Borgwardt, A. Gretton, M. J. Rasch, H. Kriegel, B. Schlkopf et al., Integrating structured biological data by Kernel Maximum Mean Discrepancy, Bioinformatics, 2006.
DOI : 10.1093/bioinformatics/btl242

T. Brox and J. Malik, Object Segmentation by Long Term Analysis of Point Trajectories, ECCV, 2010.
DOI : 10.1007/978-3-642-15555-0_21

O. Chum and A. Zisserman, An Exemplar Model for Learning Object Classes, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383050

R. G. Cinbis, J. Verbeek, and C. Schmid, Multi-fold MIL Training for Weakly Supervised Object Localization, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.309

URL : https://hal.archives-ouvertes.fr/hal-00975746

N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2004.
DOI : 10.1109/CVPR.2005.177

URL : https://hal.archives-ouvertes.fr/inria-00548512

T. Deselaers, B. Alexe, and V. Ferrari, Weakly supervised localization and learning with generic knowledge. IJCV, 2012.

J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang et al., Decaf: A deep convolutional activation feature for generic visual recognition. arXiv preprint, 2013.

L. Duan, I. W. Tsang, and D. Xu, Domain transfer multiple kernel learning, In IEEE Trans. on PAMI, issue.7, 2012.

L. Duan, D. Xu, I. W. Tsang, and J. Luo, Visual event recognition in videos by learning from web data, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2010.5539870

M. Everingham, L. Van-gool, C. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, 2007.
DOI : 10.1007/s11263-009-0275-4

P. Felzenszwalb, R. Girshick, D. Mcallester, and D. Ramanan, Object Detection with Discriminatively Trained Part-Based Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.9, 2009.
DOI : 10.1109/TPAMI.2009.167

R. Fergus, P. Perona, and A. Zisserman, Object class recognition by unsupervised scale-invariant learning, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., 2003.
DOI : 10.1109/CVPR.2003.1211479

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2014.81

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation. https://github.com/rbgirshick, 2014.

R. B. Girshick, P. F. Felzenszwalb, and D. Mcallester, Discriminatively trained deformable part models, release 5

R. Gopalan, R. Li, and R. Chellappa, Unsupervised Adaptation Across Domain Shifts by Generating Intermediate Data Representations, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.36, issue.11, 2014.
DOI : 10.1109/TPAMI.2013.249

J. Hoffman, E. Rodner, J. Donahue, B. Kulis, and K. Saenko, Asymmetric and Category Invariant Feature Transformations for Domain Adaptation, International Journal of Computer Vision, vol.39, issue.12, 2014.
DOI : 10.1007/s11263-014-0719-3

Y. Jia, Caffe, Proceedings of the ACM International Conference on Multimedia, MM '14, 2013.
DOI : 10.1145/2647868.2654889

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long et al., Caffe, Proceedings of the ACM International Conference on Multimedia, MM '14, pp.1408-5093, 2014.
DOI : 10.1145/2647868.2654889

G. Kim, L. Sigal, and E. P. Xing, Joint summarization of large sets of web images and videos for storyline reconstruction, CVPR, 2014.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, NIPS, 2012.

Y. J. Lee, J. Kim, and K. Grauman, Key-segments for video object segmentation, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126471

C. Leistner, M. Godec, S. Schulter, A. Saffari, and H. Bischof, Improving classifiers with unlabeled weakly-related videos, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995475

T. Malisiewicz, A. Gupta, and A. Efros, Ensemble of exemplar-SVMs for object detection and beyond, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126229

S. J. Pan and Q. Yang, A Survey on Transfer Learning, IEEE Transactions on Knowledge and Data Engineering, vol.22, issue.10, 2010.
DOI : 10.1109/TKDE.2009.191

M. Pandey and S. Lazebnik, Scene recognition and weakly supervised object localization with deformable part-based models, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126383

A. Papazoglou and V. Ferrari, Fast Object Segmentation in Unconstrained Video, 2013 IEEE International Conference on Computer Vision, 2005.
DOI : 10.1109/ICCV.2013.223

A. Prest, C. Leistner, J. Civera, C. Schmid, and V. Ferrari, Learning object class detectors from weakly annotated video, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2006.
DOI : 10.1109/CVPR.2012.6248065

URL : https://hal.archives-ouvertes.fr/hal-00695940

P. Sharma and R. Nevatia, Efficient Detector Adaptation for Object Detection in a Video, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.418

P. Siva, C. Russell, T. Xiang, and L. Agapito, Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.416

P. Siva and T. Xiang, Weakly supervised object detector learning with model drift detection, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126261

P. Siva, T. Xiang, and C. Russell, In defence of negative mining for annotating weakly labeled data, ECCV, 2012.

H. Song, R. Girshick, S. Jegelka, J. Mairal, Z. Harchaoui et al., On learning to localize objects with minimal supervision, ICML, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00996849

K. Tang, V. Ramanathan, L. Fei-fei, and D. Koller, Shifting weights: Adapting object detectors from image to video, NIPS, 2012.

K. Tang, R. Sukthankar, J. Yagnik, and L. Fei-fei, Discriminative Segment Annotation in Weakly Labeled Video, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.321

A. Torralba and A. A. Efros, Unbiased look at dataset bias, CVPR 2011, 2008.
DOI : 10.1109/CVPR.2011.5995347

J. R. Uijlings, K. E. Van-de-sande, T. Gevers, and A. W. Smeulders, Selective Search for Object Recognition, International Journal of Computer Vision, vol.57, issue.1, 2004.
DOI : 10.1007/s11263-013-0620-5

L. Van-der-maaten and G. Hinton, Visualizing data using t-sne, JMLR, 2008.

P. Viola and M. Jones, Rapid object detection using a boosted cascade of simple features, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, 2001.
DOI : 10.1109/CVPR.2001.990517

L. Wang, Y. Qiao, and X. Tang, Video Action Detection with Relational Dynamic-Poselets, ECCV, 2014.
DOI : 10.1007/978-3-319-10602-1_37

X. Wang, M. Yang, S. Zhu, and Y. Lin, Regionlets for generic object detection, ICCV, 2013.