J. Alayrac, P. Bojanowski, N. Agrawal, I. Laptev, J. Sivic et al., Unsupervised Learning from Narrated Instruction Videos, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
DOI : 10.1109/CVPR.2016.495

URL : https://hal.archives-ouvertes.fr/hal-01171193

F. Bach and Z. Harchaoui, Diffrac: a discriminative and flexible framework for clustering, NIPS, 2004.

T. L. Berg, A. C. Berg, J. Edwards, M. Maire, R. White et al., Names and faces in the news, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., p.848, 2004.
DOI : 10.1109/CVPR.2004.1315253

P. Bojanowski, F. Bach, I. Laptev, J. Ponce, C. Schmid et al., Finding Actors and Actions in Movies, 2013 IEEE International Conference on Computer Vision, 2007.
DOI : 10.1109/ICCV.2013.283

URL : https://hal.archives-ouvertes.fr/hal-00904991

P. Bojanowski, R. Lajugie, F. Bach, I. Laptev, J. Ponce et al., Weakly Supervised Action Labeling in Videos under Ordering Constraints, ECCV, 2014. 1
DOI : 10.1007/978-3-319-10602-1_41

URL : https://hal.archives-ouvertes.fr/hal-01053967

P. Bojanowski, R. Lajugie, E. Grave, F. Bach, I. Laptev et al., Weakly-Supervised Alignment of Video with Text, 2015 IEEE International Conference on Computer Vision (ICCV)
DOI : 10.1109/ICCV.2015.507

URL : https://hal.archives-ouvertes.fr/hal-01154523

F. Caba-heilbron, V. Escorcia, B. Ghanem, and J. C. Niebles, ActivityNet: A large-scale video benchmark for human activity understanding, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.961-970, 2015.
DOI : 10.1109/CVPR.2015.7298698

T. Cour, B. Sapp, C. Jordan, and B. Taskar, Learning from ambiguously labeled images, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206667

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.153.1111

O. Duchenne, I. Laptev, J. Sivic, F. Bach, and J. Ponce, Automatic annotation of human actions in video, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459279

M. Everingham, J. Sivic, and A. Zisserman, Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video, Procedings of the British Machine Vision Conference 2006, 2006.
DOI : 10.5244/C.20.92

M. Frank and P. Wolfe, An algorithm for quadratic programming, Naval Research Logistics Quarterly, vol.3, issue.1-2, 1956.
DOI : 10.2140/pjm.1955.5.183

R. B. Girshick, P. F. Felzenszwalb, and D. Mcallester, Discriminatively trained deformable part models, release 5

K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.90

URL : http://arxiv.org/abs/1512.03385

M. Honnibal and M. Johnson, An Improved Non-monotonic Transition System for Dependency Parsing, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015.
DOI : 10.18653/v1/D15-1162

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.697.7084

M. Jaggi, Revisiting Frank-Wolfe: Projection-free sparse convex optimization, ICML, 2013.

A. Joulin, F. Bach, and J. Ponce, Discriminative clustering for image co-segmentation, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p.8, 2010.
DOI : 10.1109/CVPR.2010.5539868

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.205.8594

A. Joulin, F. Bach, and J. Ponce, Multi-class cosegmentation, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6247719

URL : https://hal.archives-ouvertes.fr/hal-00717448

A. Joulin, K. Tang, and L. Fei-fei, Efficient Image and Video Co-localization with Frank-Wolfe Algorithm, ECCV, 2014, p.8
DOI : 10.1007/978-3-319-10599-4_17

URL : http://ai.stanford.edu/%7Ekdtang/papers/eccv14-vidcoloc.pdf

A. Joulin, K. Tang, and L. Fei-fei, Efficient Image and Video Co-localization with Frank-Wolfe Algorithm, ECCV, 2014.
DOI : 10.1007/978-3-319-10599-4_17

URL : http://ai.stanford.edu/%7Ekdtang/papers/eccv14-vidcoloc.pdf

A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar et al., Large-scale video classification with convolutional neural networks Block-coordinate frank-wolfe optimization for structural SVMs, ICML, 2013. 1, pp.1725-1732
DOI : 10.1109/cvpr.2014.223

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.471.3312

T. Lan, Y. Wang, and G. Mori, Discriminative figure-centric models for joint action localization and recognition, ICCV, 2011.

I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, Learning realistic human actions from movies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, p.5, 2008.
DOI : 10.1109/CVPR.2008.4587756

URL : https://hal.archives-ouvertes.fr/inria-00548659

G. Levi, D. Kaufman, L. Wolf, and T. Hassner, Video Description by Combining Strong Representation and a Simple Nearest Neighbor Approach, ECCV LSMDC2016 Workshop, 2016.

M. Mathias, R. Benenson, M. Pedersoli, and L. Van-gool, Face Detection without Bells and Whistles, ECCV, 2014.
DOI : 10.1007/978-3-319-10593-2_47

A. Osokin, J. Alayrac, I. Lukasewitz, P. Dokania, and S. Lacoste-julien, Minding the gaps for block frank-wolfe optimization of structured svms, ICML, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01323727

O. Parkhi, E. Rahtu, and A. Zisserman, It's in the bag: Stronger supervision for automated face labelling, ICCV Workshop, 2015.

O. Parkhi, A. Vedaldi, and A. Zisserman, Deep Face Recognition, Procedings of the British Machine Vision Conference 2015, 2015.
DOI : 10.5244/C.29.41

V. Ramanathan, A. Joulin, P. Liang, and L. Fei-fei, Linking People in Videos with ???Their??? Names Using Coreference Resolution, ECCV, 2014.
DOI : 10.1007/978-3-319-10590-1_7

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.646.430

A. Rohrbach, M. Rohrbach, N. Tandon, and B. Schiele, A dataset for Movie Description, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.7, 2015.
DOI : 10.1109/CVPR.2015.7298940

URL : http://arxiv.org/abs/1501.02530

G. Seguin, P. Bojanowski, R. Lajugie, and I. Laptev, Instance-Level Video Segmentation from Object Tracks, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
DOI : 10.1109/CVPR.2016.400

URL : https://hal.archives-ouvertes.fr/hal-01255765

R. Shaoqing, H. Kaiming, G. Ross, and S. Jian, Faster rcnn: Towards real-time object detection with region proposal networks, NIPS, 2015.

N. Shapovalova, A. Vahdat, K. Cannons, T. Lan, and G. Mori, Similarity Constrained Latent Support Vector Machine: An Application to Weakly Supervised Action Classification, ECCV, 2012.
DOI : 10.1007/978-3-642-33786-4_5

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.261.850

K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, NIPS, pp.568-576, 2014.

J. Sivic, M. Everingham, and A. Zisserman, Who are you? " -Learning person specific classifiers from video, CVPR, 2009.
DOI : 10.1109/cvprw.2009.5206513

K. Soomro, A. R. Zamir, and M. Shah, UCF101: A dataset of 101 human actions classes from videos in the wild, 2012.

M. Tapaswi, M. Bauml, and R. Stiefelhagen, Knock! Knock! Who is it? " probabilistic person identification in tv-series, CVPR, 2012.
DOI : 10.1109/cvpr.2012.6247986

H. Wang and C. Schmid, Action Recognition with Improved Trajectories, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.441

URL : https://hal.archives-ouvertes.fr/hal-00873267

L. Wang, Y. Qiao, and X. Tang, Action recognition with trajectory-pooled deep-convolutional descriptors, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4305-4314, 2015.
DOI : 10.1109/CVPR.2015.7299059

URL : http://arxiv.org/abs/1505.04868

P. Weinzaepfel, X. Martin, and C. Schmid, Towards weakly-supervised action localization. arXiv preprint, 2016.

Y. Yu, H. Ko, J. Choi, and G. Kim, Video captioning and retrieval models with semantic attention, ECCV LSMDC2016 Workshop, 2016.