A. Ayvaci, M. Raptis, and S. Soatto, Sparse Occlusion Detection with Optical Flow, International Journal of Computer Vision, vol.31, issue.5, p.2012
DOI : 10.1007/s11263-011-0490-7

F. Bach and Z. Harchaoui, Diffrac: a discriminative and flexible framework for clustering, NIPS, 2007.

A. Blake and A. Zisserman, Visual reconstruction, 1987.

P. Bojanowski, F. Bach, I. Laptev, J. Ponce, C. Schmid et al., Finding Actors and Actions in Movies, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.283

URL : https://hal.archives-ouvertes.fr/hal-00904991

P. Bojanowski, R. Lajugie, F. Bach, I. Laptev, J. Ponce et al., Weakly Supervised Action Labeling in Videos under Ordering Constraints, ECCV, 2014.
DOI : 10.1007/978-3-319-10602-1_41

URL : https://hal.archives-ouvertes.fr/hal-01053967

S. Boyd and L. Vandenberghe, Convex optimization, 2004.

J. Chang, D. Wei, J. W. Fisher, and I. , A Video Representation Using Temporal Superpixels, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.267

A. Colombari, A. Fusiello, and V. Murino, Segmentation and tracking of multiple video objects. Pattern Recognition, 2007.

M. Danelljan, G. Häger, F. S. Khan, and M. Felsberg, Accurate Scale Estimation for Robust Visual Tracking, Proceedings of the British Machine Vision Conference 2014, 2014.
DOI : 10.5244/C.28.65

M. Everingham, J. Sivic, and A. Zisserman, Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video, Procedings of the British Machine Vision Conference 2006, 2006.
DOI : 10.5244/C.20.92

M. Everingham, L. Van-gool, C. K. Williams, J. Winn, and A. Zisserman, The pascal visual object classes (voc) challenge . IJCV, 2010.

A. Fathi, M. Balcan, X. Ren, and J. M. Rehg, Combining Self Training and Active Learning for Video Segmentation, Procedings of the British Machine Vision Conference 2011, 2011.
DOI : 10.5244/C.25.78

M. Frank and P. Wolfe, An algorithm for quadratic programming, Naval Research Logistics Quarterly, vol.3, issue.1-2, 1956.
DOI : 10.1002/nav.3800030109

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.81

Y. Guo and D. Schuurmans, Convex relaxations of latent variable training, NIPS, 2007.

B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik, Simultaneous Detection and Segmentation, ECCV, 2014.
DOI : 10.1007/978-3-319-10584-0_20

X. He and S. Gould, Multi-instance Object Segmentation with Exemplars, 2013 IEEE International Conference on Computer Vision Workshops, 2013.
DOI : 10.1109/ICCVW.2013.9

A. Hernández-vela, M. Reyes, V. Ponce, and S. Escalera, GrabCut-Based Human Segmentation in Video Sequences, Sensors, vol.12, issue.12, 2012.
DOI : 10.3390/s121115376

P. Jaccard, THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1, New Phytologist, vol.11, issue.2, 1912.
DOI : 10.1111/j.1469-8137.1912.tb05611.x

M. Jaggi, Revisiting Frank-Wolfe: Projection-free sparse convex optimization, ICML, 2013.

S. D. Jain and K. Grauman, Supervoxel-Consistent Foreground Propagation in Video, ECCV, 2014.
DOI : 10.1007/978-3-319-10593-2_43

A. Joulin, F. Bach, and J. Ponce, Discriminative clustering for image co-segmentation, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5539868

A. Joulin, F. Bach, and J. Ponce, Multi-class cosegmentation, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6247719

URL : https://hal.archives-ouvertes.fr/hal-00717448

L. Ladicky, C. Russell, P. Kohli, and P. H. Torr, Associative hierarchical CRFs for object class image segmentation, 2009 IEEE 12th International Conference on Computer Vision
DOI : 10.1109/ICCV.2009.5459248

Y. J. Lee, J. Kim, and K. Grauman, Key-segments for video object segmentation, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126471

V. S. Lempitsky, P. Kohli, C. Rother, and T. Sharp, Image segmentation with a bounding box prior, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459262

J. Lezama, K. Alahari, J. Sivic, and I. Laptev, Track to the future: Spatio-temporal video segmentation with long-range motion cues, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.6044588

URL : https://hal.archives-ouvertes.fr/hal-00817961

F. Li, T. Kim, A. Humayun, D. Tsai, and J. M. Rehg, Video Segmentation by Tracking Many Figure-Ground Segments, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.273

P. Ochs, J. Malik, and T. Brox, Segmentation of Moving Objects by Long Term Video Analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.36, issue.6, 2014.
DOI : 10.1109/TPAMI.2013.242

G. Papandreou, L. Chen, K. Murphy, and A. L. Yuille, Weakly-and semi-supervised learning of a DCNN for semantic image segmentation, ICCV, 2015.

A. Papazoglou and V. Ferrari, Fast Object Segmentation in Unconstrained Video, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.223

V. Ramanathan, A. Joulin, P. Liang, and L. Fei-fei, Linking People in Videos with ???Their??? Names Using Coreference Resolution, ECCV, 2014.
DOI : 10.1007/978-3-319-10590-1_7

S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, NIPS, 2015.
DOI : 10.1109/TPAMI.2016.2577031

C. Rother, V. Kolmogorov, and A. Blake, GrabCut " : Interactive foreground extraction using iterated graph cuts, SIGGRAPH, 2004.

G. Seguin, K. Alahari, J. Sivic, and I. Laptev, Pose estimation and segmentation of people in 3d movies, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00874884

J. Shi and J. Malik, Normalized cuts and image segmentation, CVPR, 1997.

J. Shi and C. Tomasi, Good features to track, CVPR, 1994.

J. Snoek, H. Larochelle, and R. P. Adams, Practical Bayesian optimization of machine learning algorithms, NIPS, 2012.

K. Tang, A. Joulin, L. Li, and L. Fei-fei, Co-localization in Real-World Images, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.190

B. Taylor, V. Karasev, and S. Soatto, Causal video object segmentation from persistence of occlusions, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7299055

J. Tighe, M. Niethammer, and S. Lazebnik, Scene Parsing with Object Instances and Occlusion Ordering, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.479

V. Vineet, J. Warrell, L. Ladicky, and P. Torr, Human Instance Segmentation from Video using Detector-based Conditional Random Fields, Procedings of the British Machine Vision Conference 2011, 2011.
DOI : 10.5244/C.25.80

P. Weinzaepfel, J. Revaud, Z. Harchaoui, and C. Schmid, DeepFlow: Large Displacement Optical Flow with Deep Matching, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.175

URL : https://hal.archives-ouvertes.fr/hal-00873592

M. Zaslavskiy, F. Bach, and J. Vert, A Path Following Algorithm for the Graph Matching Problem, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, issue.12, 2009.
DOI : 10.1109/TPAMI.2008.245

URL : https://hal.archives-ouvertes.fr/hal-00232851

Y. Zhang, X. Chen, J. Li, C. Wang, and C. Xia, Semantic object segmentation via detection in weakly labeled video, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7298987

Z. Zhang, A. Schwing, S. Fidler, and R. Urtasun, Monocular Object Instance Segmentation and Depth Ordering with CNNs, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.300