-. M. Castro, M. J. Marín-jiménez, N. Guil, C. Schmid, and K. Alahari,

-. Shmelkov, C. Schmid, and K. Alahari, Incremental Learning of Object Detectors without Catastrophic Forgetting, p.2017
URL : https://hal.archives-ouvertes.fr/hal-01573623

-. Tokmakov, K. Alahari, and C. Schmid, Weakly-Supervised Semantic Segmentation using Motion Cues, ECCV, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01292794

K. Alahari and C. V. Jawahar, Discriminative Actions for Recognising Events, ICVGIP, 2006.

, Dynamic Events as Mixtures of Spatial and Temporal Features, ICVGIP, 2006.

K. Alahari, P. Kohli, and P. H. Torr, Dynamic Hybrid Algorithms for MAP Inference in Discrete MRFs, Trans. PAMI, vol.32, issue.10, pp.1846-1857, 2010.
URL : https://hal.archives-ouvertes.fr/hal-01216727

, Reduce, Reuse & Recycle: Efficiently Solving Multi-Label MRFs, 2008.

K. Alahari, S. Kuthirummal, C. V. Jawahar, and P. J. Narayanan, Geometric and Stochastic Error Minimisation in Motion Tracking, 2004.

K. Alahari, S. L. Putrevu, and C. V. Jawahar, Discriminant Substrokes for Online Handwriting Recognition, ICDAR, 2005.

, Learning Mixtures of Offline and Online features for Handwritten Stroke Recognition, ICPR, 2006.

K. Alahari, G. Seguin, J. Sivic, and I. Laptev, Pose Estimation and Segmentation of People in 3D Movies, ICCV, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00874884

F. M. Castro, M. J. Marin-jimenez, N. Guil, C. Schmid, and K. Alahari, Endto-End Incremental Learning, ECCV, 2018.

A. Cherian, J. Mairal, K. Alahari, and C. Schmid, Mixing Body-Part Sequences for Human Pose Estimation, CVPR, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00978643

M. Cho, K. Alahari, and J. Ponce, Learning Graphs to Match, ICCV, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00875105

Y. Hua, K. Alahari, and C. Schmid, Occlusion and Motion Reasoning for Longterm Tracking, ECCV, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01020149

, Online Object Tracking with Proposal Selection, ICCV, 2015.

L. Ladicky, P. Sturgess, K. Alahari, C. Russell, and P. Torr, What, Where & How Many? Combining Object Detectors and CRFs, 2010.
URL : https://hal.archives-ouvertes.fr/hal-01216730

J. Lezama, K. Alahari, J. Sivic, and I. Laptev, Track to the Future: Spatiotemporal Video Segmentation with Long-range Motion Cues, CVPR, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00817961

A. Mishra, K. Alahari, and C. V. Jawahar, Enhancing Energy Minimization Framework for Scene Text Recognition with Top-Down Cues, vol.145, pp.30-42, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01263322

, Image Retrieval using Textual Cues, ICCV, 2013.

, Scene Text Recognition using Higher Order Language Priors, BMVC, 2012.

, Top-Down and Bottom-Up Cues for Scene Text Recognition, CVPR, 2012.

S. Ramalingam, P. Kohli, K. Alahari, and P. Torr, Exact Inference in Multi-label CRFs with Higher Order Cliques, CVPR, 2008.
URL : https://hal.archives-ouvertes.fr/hal-01217304

U. Roy, A. Mishra, K. Alahari, and C. V. Jawahar, Scene Text Recognition and Retrieval for Large Lexicons, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01088739

R. Sarvadevabhatla, K. Alahari, and C. V. Jawahar, Recognizing Human Activities from Constituent Actions, National Conf. Communications, 2005.

G. Seguin, K. Alahari, J. Sivic, and I. Laptev, Pose Estimation and Segmentation of Multiple People in Stereoscopic Movies, Trans. PAMI, vol.37, issue.8, pp.1643-1655, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01089660

K. Shmelkov, C. Schmid, and K. Alahari, How good is my GAN?, ECCV, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01850447

, Incremental Learning of Object Detectors without Catastrophic Forgetting, ICCV, 2017.

P. Sturgess, K. Alahari, L. Ladicky, and P. H. Torr, Combining Appearance and Structure from Motion Features for Road Scene Understanding, BMVC, 2009.
URL : https://hal.archives-ouvertes.fr/hal-01216879

P. Tokmakov, K. Alahari, and C. Schmid, Learning Motion Patterns in Videos, CVPR, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01427480

, Learning Video Object Segmentation with Visual Memory, 2017.

, Weakly-Supervised Semantic Segmentation using Motion Cues, ECCV, 2016.

P. Tokmakov, C. Schmid, and K. Alahari, Learning to Segment Moving Objects, IJCV, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01653720

, ICDAR 2003 datasets

, Street View Text dataset

E. H. Adelson, On seeing stuff: the perception of materials by humans and machines, Proc. SPIE, vol.4299, pp.1-12, 2001.

M. Andriluka, S. Roth, and B. Schiele, People-tracking-by-detection and peopledetection-by-tracking, CVPR, 2008.

, Pictorial Structures Revisited: People Detection and Articulated Pose Estimation, CVPR, 2009.

R. Arandjelovic and A. Zisserman, Objects that Sound, ECCV, 2018.

P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, Contour Detection and Hierarchical Image Segmentation, Trans. PAMI, vol.33, issue.5, pp.898-916, 2011.

P. Arbeláez, J. Pont-tuset, J. T. Barron, F. Marques, and J. Malik, Multiscale Combinatorial Grouping, CVPR, 2014.

S. Avidan, Ensemble Tracking, Trans. PAMI, vol.29, issue.2, pp.261-271, 2007.

B. Babenko, M. Yang, and S. Belongie, Robust Object Tracking with Online Multiple Instance Learning, Trans. PAMI, vol.33, issue.8, pp.1619-1632, 2011.

D. Batra, P. Yadollahpour, A. Guzman-rivera, and G. Shakhnarovich, Diverse M-best solutions in Markov random fields, ECCV, 2012.

S. Bell, C. L. Zitnick, K. Bala, and R. Girshick, Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks, CVPR, 2016.

A. Bianne-bernard, F. Menasri, R. A. Mohamad, C. Mokbel, C. Kermorvant et al., Dynamic and Contextual Information in HMM Modeling for Handwritten Word Recognition, Trans. PAMI, vol.33, issue.10, pp.2066-2080, 2011.

P. Bideau and E. G. Learned-miller, It's Moving! A Probabilistic Model for Causal Motion Segmentation in Moving Camera Videos, ECCV, 2016.

A. Bissacco, M. Cummins, Y. Netzer, and H. Neven, PhotoOCR: Reading Text in Uncontrolled Conditions, ICCV, 2013.

A. Blake, C. Rother, M. Brown, P. Perez, and P. H. Torr, Interactive Image Segmentation Using an Adaptive GMMRF Model, ECCV, 2004.

E. Boros and P. L. Hammer, Pseudo-Boolean optimization, Discrete Applied Mathematics, 2002.
URL : https://hal.archives-ouvertes.fr/hal-01150533

Y. Boykov and M. Jolly, Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images, ICCV, 2001.

Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate Energy Minimization via Graph Cuts, Trans. PAMI, vol.23, issue.11, pp.1222-1239, 2001.

W. Brendel and S. Todorovic, Video object segmentation by tracking regions, ICCV, 2009.

M. Brox and J. Malik, Object Segmentation by Long Term Analysis of Point Trajectories, ECCV, 2010.

T. Brox and J. Malik, Large displacement optical flow: Descriptor matching in variational motion estimation, Trans. PAMI, vol.33, issue.3, pp.510-513, 2011.

S. Caelles, K. M. Pont-tuset, L. Leal-taixé, D. Cremers, and L. Van-gool, One-Shot Video Segmentation, 2017.

G. Cauwenberghs and T. Poggio, Incremental and Decremental Support Vector Machine Learning, NIPS, 2000.

L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, Semantic image segmentation with deep convolutional nets and fully connected CRFs, 2015.

, DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, Trans. PAMI, vol.40, issue.4, pp.834-848, 2018.

X. Chen, A. Shrivastava, and A. Gupta, NEIL: Extracting Visual Knowledge from Web Data, ICCV, 2013.

D. Comaniciu and P. Meer, Mean shift: A robust approach toward feature space analysis, Trans. PAMI, 2002.

N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, CVPR, 2005.
URL : https://hal.archives-ouvertes.fr/inria-00548512

M. Danelljan, G. Häger, F. S. Khan, and M. Felsberg, Accurate Scale Estimation for Robust Visual Tracking, BMVC, 2014.

A. Delong, A. Osokin, H. N. Isack, and Y. Boykov, Fast Approximate Energy Minimization with Label Costs, CVPR, 2010.

D. Dementhon, Spatio-temporal segmentation of video by hierarchical mean shift analysis, Statistical Methods in Video Processing Workshop, 2002.

C. Desai, D. Ramanan, and C. Fowlkes, Discriminative Models for Multi-class Object Layout, ICCV, 2009.

S. Divvala, A. Farhadi, and C. Guestrin, Learning Everything about Anything: Webly-Supervised Visual Concept Learning, CVPR, 2014.

P. Dollár and C. L. Zitnick, Structured Forests for Fast Edge Detection, ICCV, 2013.

A. Dosovitskiy, P. Fischer, E. Ilg, P. Häusser, C. Haz?rbas et al., FlowNet: Learning Optical Flow with Convolutional Networks, ICCV, 2015.

M. Eichner, M. Marin-jimenez, A. Zisserman, and V. Ferrari, 2D Articulated Human Pose Estimation and Retrieval in (Almost) Unconstrained Still Images, IJCV, vol.99, issue.2, pp.190-214, 2012.

K. Elagouni, C. Garcia, and P. Sébillot, A comprehensive neural-based approach for text recognition in videos using natural language processing, ICMR, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00645219

I. Endres and D. Hoiem, Category-Independent Object Proposals with Diverse Ranking, Trans. PAMI, vol.36, issue.2, pp.222-234, 2014.

B. Epshtein, E. Ofek, and Y. Wexler, Detecting Text in Natural Scenes with Stroke Width Transform, CVPR, 2010.

M. Everingham, S. M. Eslami, L. Van-gool, C. K. Williams, J. Winn et al., The Pascal Visual Object Classes Challenge: A Retrospective, IJCV, vol.111, issue.1, pp.98-136, 2015.

M. Everingham, J. Sivic, and A. Zisserman, Hello! My name is

. Buffy, Automatic naming of characters in TV video, BMVC, 2006.

A. Faktor and M. Irani, Video Segmentation by Non-Local Consensus Voting, BMVC, 2014.

C. Farabet, C. Couprie, L. Najman, and Y. Lecun, Learning hierarchical features for scene labeling, Trans. PAMI, vol.35, issue.8, pp.1915-1929, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00742077

J. L. Feild and E. G. Learned-miller, Improving Open-Vocabulary Scene Text Recognition, ICDAR, 2013.

P. F. Felzenszwalb and D. P. Huttenlocher, Efficient Graph-Based Image Segmentation, IJCV, vol.59, issue.2, pp.167-181, 2004.

P. Felzenszwalb, R. Girshick, D. Mcallester, and D. Ramanan, Object Detection with Discriminatively Trained Part Based Models, Trans. PAMI, vol.32, issue.9, pp.1627-1645, 2010.

P. Felzenszwalb and D. Huttenlocher, Pictorial structures for object recognition, IJCV, vol.61, issue.1, pp.55-79, 2005.

P. Felzenszwalb and D. Huttenlocher, Distance Transforms of Sampled Functions, Theory of Computing, vol.8, 2012.

V. Ferrari, M. Marin-jimenez, and A. Zisserman, Progressive search space reduction for human pose estimation, CVPR, 2008.

M. Fischler and R. Elschlager, The representation and matching of pictorial structures, IEEE Trans. Computers, vol.100, issue.1, pp.67-92, 1973.

K. Fragkiadaki, H. Hu, and J. Shi, Pose from Flow and Flow from Pose, CVPR, 2013.

J. Gao, Z. Yang, C. Sun, K. Chen, and R. Nevatia, TURN TAP: Temporal Unit Regression Networks for Temporal Action Proposals, 2017.

R. Girshick, Fast R-CNN, ICCV, 2015.

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR, 2014.

L. Gómez and D. Karatzas, Scene Text Recognition: No Country for Old Men?, ACCV Workshops, 2014.

I. Goodfellow, J. Pouget-abadie, M. Mirza, B. Xu, D. Warde-farley et al., Generative adversarial nets, NIPS, 2014.

A. Graves, N. Jaitly, and A. Mohamed, Hybrid speech recognition with deep bidirectional LSTM, Workshop on Automatic Speech Recognition and Understanding, 2013.

A. Graves, G. Wayne, M. Reynolds, T. Harley, I. Danihelka et al., Hybrid computing using a neural network with dynamic external memory, 2016.

M. Grundmann, V. Kwatra, M. Han, and I. Essa, Efficient Hierarchical GraphBased Video Segmentation, CVPR, 2010.

V. Gulshan, V. Lempitsky, and A. Zisserman, Humanising GrabCut: Learning to segment humans using the Kinect, ICCV Workshop Consumer Depth Cameras for Computer Vision, 2011.

B. Han, H. Adam, and J. Sim, BranchOut: Regularization for Online Ensemble Tracking with CNNs, 2017.

S. Hare, A. Saffari, and P. H. Torr, Struck: Structured output tracking with kernels, ICCV, 2011.

R. I. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2004.

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, CVPR, 2016.

J. Henriques, R. Caseiro, P. Martins, and J. Batista, High-Speed Tracking with Kernelized Correlation Filters, Trans. PAMI, vol.37, issue.3, pp.583-596, 2015.

G. Hinton, O. Vinyals, and J. Dean, Distilling the knowledge in a neural network, NIPS, 2014.

S. Hochreiter and J. Schmidhuber, Long Short-term Memory, Neural Computation, vol.9, issue.8, pp.1735-1780, 1997.

S. Hong, D. Yeo, S. Kwak, H. Lee, and B. Han, Weakly Supervised Semantic Segmentation using Web-Crawled Videos, 2017.

P. V. Hough, Method and means for recognizing complex patterns, US Patent 3,069,654, 1962.

E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy et al., Flownet 2.0: Evolution of optical flow estimation with deep networks, 2017.

M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, Reading Text in the Wild with Convolutional Neural Networks, IJCV, vol.116, issue.1, pp.1-20, 2016.

M. Jaderberg, A. Vedaldi, and A. Zisserman, Deep Features for Text Spotting, ECCV, 2014.

S. Jain, B. Xiong, and K. Grauman, Fusionseg: Learning to combine motion and appearance for fully automatic segmention of generic objects in videos, 2017.

T. Judd, K. Ehinger, F. Durand, and A. Torralba, Learning to Predict Where Humans Look, ICCV, 2009.

Z. Kalal, K. Mikolajczyk, and J. Matas, Tracking-Learning-Detection, Trans. PAMI, vol.34, issue.7, pp.1409-1422, 2012.

R. E. Kalman, A New Approach to Linear Filtering and Prediction Problems, J. Fluids Engineering, 1960.

D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. Bigorda et al., ICDAR 2013 Robust Reading Competition, 2013.

C. Keller, M. Enzweiler, M. Rohrbach, D. Llorca, C. Schnorr et al., The Benefits of Dense Stereo for Pedestrian Detection, IEEE Trans. Intell. Transp. Syst, vol.12, issue.4, pp.1096-1106, 2011.

M. Keuper, B. Andres, and T. Brox, Motion trajectory segmentation via minimum cost multicuts, ICCV, 2015.

A. Khoreva, F. Galasso, M. Hein, and B. Schiele, Classifier Based Graph Construction for Video Segmentation, CVPR, 2015.

A. Khoreva, F. Perazzi, R. Benenson, B. Schiele, and A. Sorkine-hornung, Learning Video Object Segmentation from Static Images, 2017.

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins et al., Overcoming catastrophic forgetting in neural networks, 2017.

Y. J. Koh and C. Kim, Primary Object Segmentation in Videos Based on Region Augmentation and Reduction, 2017.

V. Kolmogorov, A. Criminisi, A. Blake, G. Cross, and C. Rother, Bi-layer segmentation of binocular stereo video, CVPR, 2005.

V. Kolmogorov and R. Zabih, What energy functions can be minimized via graph cuts, Trans. PAMI, vol.26, issue.2, pp.147-159, 2004.

V. Kolmogorov, Convergent Tree-Reweighted Message Passing for Energy Minimization, Trans. PAMI, vol.28, issue.10, pp.1568-1583, 2006.

N. Komodakis, N. Paragios, and G. Tziritas, MRF Optimization via Dual Decomposition: Message-Passing Revisited, ICCV, 2007.

S. Koppal, C. Zitnick, M. Cohen, S. Kang, B. Ressler et al., A viewercentric editor for 3D movies, Computer Graphics and Applications, vol.31, issue.1, pp.20-35, 2011.

P. Krähenbühl and V. Koltun, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS, 2011.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, NIPS, 2012.

D. Kumar, M. N. Prasad, and A. G. Ramakrishnan, NESP: Nonlinear enhancement and selection of plane for optimal segmentation and recognition of scene word images, 2013.

M. P. Kumar, P. H. Torr, and A. Zisserman, Learning Layered Motion Segmentations of Video, ICCV, 2005.

L. Ladicky, P. H. Torr, and A. Zisserman, Human Pose Estimation using a Joint Pixel-wise and Part-wise Formulation, CVPR, 2013.

J. Lafferty, A. Mccallum, and F. Pereira, Conditional Random Fields: Probabilistic models for segmenting and labelling sequence data, ICML, 2001.

M. W. Lee and R. Nevatia, Human pose tracking in monocular sequence using multilevel structured models, Trans. PAMI, vol.31, issue.1, pp.27-38, 2009.

Y. Lee, J. Kim, and K. Grauman, Key-segments for video object segmentation, ICCV, 2011.

Z. Li and D. Hoiem, Learning without forgetting, ECCV, 2016.

T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona et al., Microsoft COCO: Common objects in context, 2014.

C. Liu, Beyond Pixels: Exploring New Representations and Applications for Motion Analysis, 2009.

J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, CVPR, 2015.

L. Lovász and M. D. Plummer, Matching theory, 1986.

M. Kristan, The Visual Object Tracking VOT2014 challenge results, ECCV Visual Object Tracking Challenge Workshop, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01301090

M. Marsza?ek, I. Laptev, and C. Schmid, Actions in Context, CVPR, 2009.

I. Matthews, T. Ishikawa, and S. Baker, The Template Update Problem, Trans. PAMI, vol.26, issue.6, pp.810-815, 2004.

N. Mayer, E. Ilg, P. Häusser, P. Fischer, D. Cremers et al., A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation, CVPR, 2016.

M. Mccloskey and N. J. Cohen, Catastrophic interference in connectionist networks: The sequential learning problem, Psychology of learning and motivation, vol.24, pp.109-165, 1989.

T. Mensink, J. Verbeek, F. Perronnin, and G. Csurka, Distance-Based Image Classification: Generalizing to New Classes at Near-Zero Cost, Trans. PAMI, vol.35, issue.11, pp.2624-2637, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00817211

A. Monroy and B. Ommer, Beyond Bounding-Boxes: Learning Object Shape by Model-Driven Grouping, ECCV, 2012.

M. Mostajabi, P. Yadollahpour, and G. Shakhnarovich, Feedforward Semantic Segmentation With Zoom-Out Features, CVPR, 2015.

K. P. Murphy, Y. Weiss, and M. I. Jordan, Loopy belief propagation for approximate inference: An empirical study, 1999.

G. Nagy, Twenty Years of Document Image Analysis in PAMI, Trans. PAMI, vol.22, issue.1, pp.38-62, 2000.

M. Narayana, A. R. Hanson, and E. G. Learned-miller, Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations, ICCV, 2013.

G. Nebehay and R. Pflugfelder, Consensus-based matching and tracking of keypoints for object tracking, 2014.

L. Neumann and J. Matas, A Method for Text Localization and Recognition in Real-World Images, 2010.

, A real-time scene text to speech system, ECCV workshops, 2012.

, On Combining Multiple Segmentations in Scene Text Recognition, ICDAR, 2013.

, Real-time scene text localization and recognition, CVPR, 2012.

J. C. Niebles, B. Han, and L. Fei-fei, Efficient Extraction of Human Motion Volumes by Tracking, CVPR, 2010.

T. Novikova, O. Barinova, P. Kohli, and V. S. Lempitsky, Large-Lexicon AttributeConsistent Text Recognition in Natural Images, ECCV, 2012.

A. Owens and A. A. Efros, Audio-Visual Scene Analysis with Self-Supervised Multisensory Features, ECCV, 2018.

G. Papandreou, T. Zhu, L. Chen, S. Gidaris, J. Tompson et al., PersonLab: Person Pose Estimation and Instance Segmentation with a Part-Based Geometric Embedding Model, ECCV, 2018.

G. Papandreou, T. Zhu, N. Kanazawa, A. Toshev, J. Tompson et al., Towards Accurate Multi-person Pose Estimation in the Wild, 2017.

G. Papandreou, L. Chen, K. Murphy, and A. L. Yuille, Weakly-and semisupervised learning of a DCNN for semantic image segmentation, ICCV, 2015.

A. Papazoglou and V. Ferrari, Fast object segmentation in unconstrained video, ICCV, 2013.

D. Park and D. Ramanan, N-best maximal decoders for part models, ICCV, 2011.

D. Pathak, P. Krähenbühl, and T. Darrell, Constrained Convolutional Neural Networks for Weakly Supervised Segmentation, ICCV, 2015.

D. Pathak, E. Shelhamer, J. Long, and T. Darrell, Fully convolutional multi-class multiple instance learning, ICLR, 2015.

J. Pearl, Probabilistic Reasoning in Intelligent Systems : Networks of Plausible Inference, 1988.

F. Perazzi, J. Pont-tuset, B. Mcwilliams, L. Van-gool, M. Gross et al., A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation, CVPR, 2016.

P. Pinheiro, T. Lin, R. Collobert, and P. Dollár, Learning to refine object segments, ECCV, 2016.

P. O. Pinheiro and R. Collobert, From Image-level to Pixel-level Labeling with Convolutional Networks, CVPR, 2015.

J. C. Platt, Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods, Advances in Large Margin Classifiers, 1999.

R. Polikar, L. Upda, S. S. Upda, and V. Honavar, Learn++: An incremental learning algorithm for supervised neural networks, IEEE Trans. Systems, Man, and Cybernetics, Part C, vol.31, issue.4, pp.497-508, 2001.

A. Prest, C. Leistner, J. Civera, C. Schmid, and V. Ferrari, Learning object class detectors from weakly annotated video, CVPR, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00695940

G. Pundak, T. Sainath, R. Prabhavalkar, A. Kannan, and D. Zhao, Deep context: End-to-end contextual speech recognition, IEEE Spoken Lang. Tech, 2018.

D. Ramanan, D. A. Forsyth, and A. Zisserman, Strike a pose: Tracking people by finding stylized poses, CVPR, 2005.

R. Ratcliff, Connectionist models of recognition memory: constraints imposed by learning and forgetting functions, Psychological review, vol.97, issue.2, p.285, 1990.

S. Rebuffi, A. Kolesnikov, and C. H. Lampert, iCaRL: Incremental Classifier and Representation Learning, 2017.

S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, NIPS, 2015.

X. Ren, L. Bo, and D. Fox, RGB-(D) Scene Labeling: Features and Algorithms, CVPR, 2012.

J. Revaud, P. Weinzaepfel, Z. Harchaoui, and C. Schmid, EpicFlow: Edge-Preserving Interpolation of Correspondences for Optical Flow, CVPR, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01142656

E. M. Riseman and A. R. Hanson, A Contextual Postprocessing System for Error Correction Using Binary n-Grams, IEEE Trans. Comput, pp.480-493, 1974.

M. Ristin, M. Guillaumin, J. Gall, and L. V. Gool, Incremental Learning of NCM Forests for Large-Scale Image Classification, CVPR, 2014.

M. Rohrbach, S. Amin, M. Andriluka, and B. Schiele, A database for fine grained activity detection of cooking activities, CVPR, 2012.

O. Ronneberger, P. Fischer, and T. Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation, 2015.

C. Rother, V. Kolmogorov, and A. Blake, Grabcut: Interactive foreground extraction using iterated graph cuts, ACM Trans. Graphics, vol.23, issue.3, pp.309-314, 2004.

B. Russell, A. Efros, J. Sivic, W. T. Freeman, and A. Zisserman, Using Multiple Segmentations to Discover Objects and their Extent in Image Collections, CVPR, 2006.

C. Russell, L. Ladicky, P. Kohli, and P. H. Torr, Exact and Approximate Inference in Associative Hierarchical Networks using Graph Cuts, 2010.

P. Sand and S. Teller, Particle Video: Long-Range Motion Estimation Using Point Trajectories, IJCV, vol.80, issue.1, 2008.

B. Sapp, A. Toshev, and B. Taskar, Cascaded models for articulated pose estimation, ECCV, 2010.

B. Sapp, D. Weiss, and B. Taskar, Parsing human motion with stretchable models, CVPR, 2011.

J. C. Schlimmer and D. H. Fisher, A Case Study of Incremental Concept Induction, 1986.

A. Shahab, F. Shafait, and A. Dengel, ICDAR 2011 Robust Reading Competition Challenge 2: Reading Text in Scene Images, ICDAR, 2011.

G. Sheasby, J. Valentin, N. Crook, and P. H. Torr, A Robust Stereo Prior for Human Segmentation, 2012.

C. Shi, C. Wang, B. Xiao, Y. Zhang, S. Gao et al., Scene Text Recognition Using Part-Based Tree-Structured Character Detection, CVPR, 2013.

J. Shi and J. Malik, Motion segmentation and tracking using normalized cuts, ICCV, 1998.

, Normalized Cuts and Image Segmentation, Trans. PAMI, vol.22, issue.8, pp.888-905, 2000.

J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio et al., Real-time human pose recognition in parts from single depth images, CVPR, 2011.

J. Shotton, J. Winn, C. Rother, and A. Criminisi, TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context, IJCV, vol.81, issue.1, pp.2-23, 2009.

H. Sidenbladh, M. Black, and D. Fleet, Stochastic Tracking of 3D Human Figures Using 2D Image Motion, ECCV, 2000.

K. Simonyan and A. Zisserman, Two-Stream Convolutional Networks for Action Recognition in Videos, NIPS, 2014.

, Very Deep Convolutional Networks for Large-Scale Image Recognition, ICLR, 2015.

C. Sminchisescu and B. Triggs, Estimating articulated human motion with covariance scaled sampling, IJRR, vol.22, issue.6, pp.371-391, 2003.
URL : https://hal.archives-ouvertes.fr/inria-00548242

C. Sminchisescu and A. Jepson, Variational mixture smoothing for non-linear dynamical systems, CVPR, 2004.

A. Stein, D. Hoiem, and M. Hebert, Learning to Extract Object Boundaries using Motion Cues, ICCV, 2007.

C. Sun, A. Shrivastava, C. Vondrick, K. Murphy, R. Sukthankar et al., Actor-centric Relation Network, ECCV, 2018.

N. Sundaram, T. Brox, and K. Keutzer, Dense point trajectories by GPUaccelerated large displacement optical flow, ECCV, 2010.

J. S. Supancic and D. Ramanan, Self-Paced Learning for Long-Term Tracking, CVPR, 2013.

C. Thillou, S. Ferreira, and B. Gosselin, An embedded application for degraded text recognition, EURASIP J. Applied Signal Processing, pp.2127-2135, 2005.

S. Thrun, Is learning the n-th thing any easier than learning the first?, NIPS, 1996.

C. Tomasi and T. Kanade, Detection and tracking of point features, 1991.

A. Vazquez-reina, S. Avidan, H. Pfister, and E. Miller, Multiple Hypothesis Video Segmentation from Superpixel Flows, ECCV, 2010.

A. Vedaldi and A. Zisserman, Efficient Additive Kernels via Explicit Feature Maps, Trans. PAMI, vol.34, issue.3, pp.480-492, 2012.

A. Vezhnevets, V. Ferrari, and J. M. Buhmann, Weakly supervised structured output learning for semantic segmentation, CVPR, 2012.

P. Viola and M. Jones, Rapid object detection using a boosted cascade of simple features, CVPR, 2001.

H. Wang, A. Kläser, C. Schmid, and C. Liu, Dense trajectories and motion boundary descriptors for action recognition, IJCV, vol.103, issue.1, pp.60-79, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00725627

J. M. Wang, D. J. Fleet, and A. Hertzmann, Gaussian process dynamical models for human motion, Trans. PAMI, 2008.

K. Wang, B. Babenko, and S. Belongie, End-to-End Scene Text recognition, ICCV, 2011.

J. Weinman, Z. Butler, D. Knoll, and J. Feild, Toward Integrated Scene Text Reading, Trans. PAMI, vol.36, issue.2, pp.375-387, 2014.

J. J. Weinman, E. G. Learned-miller, and A. R. Hanson, Scene Text Recognition Using Similarity and a Lexicon with Sparse Belief Propagation, Trans. PAMI, vol.31, issue.10, pp.1733-1746, 2009.

P. Weinzaepfel, J. Revaud, Z. Harchaoui, and C. Schmid, Learning to Detect Motion Boundaries, CVPR, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01142653

D. Weiss, B. Sapp, and B. Taskar, Sidestepping intractable inference with structured ensemble cascades, NIPS, 2010.

Y. Weiss, Smoothness in layers: Motion segmentation using nonparametric mixture estimation, CVPR, 1997.

Y. Weiss, Correctness of local probability propagation in graphical models with loops, Neural computation, 2000.

M. Welling, Herding Dynamical Weights to Learn, ICML, 2009.

P. Werbos, Backpropagation through time: What it does and how to do it, Proc. IEEE, vol.78, pp.1550-1560, 1990.

J. Weston, S. Chopra, and A. Bordes, Memory Networks, ICLR, 2015.

J. Wu, Y. Zhao, J. Zhu, S. Luo, and Z. Tu, MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation, CVPR, 2014.

Y. Wu, J. Lim, and M. Yang, Online Object Tracking: A Benchmark, CVPR, 2013.

C. Xiong, S. Merity, and R. Socher, Dynamic Memory Networks for Visual and Textual Question Answering, ICML, 2016.

J. Yan, M. Cho, H. Zha, X. Yang, and S. M. Chu, Multi-Graph Matching via Affinity Optimization with Graduated Consistency Regularization, Trans. PAMI, 2016.

Y. Yang, S. Hallman, D. Ramanan, and C. Fowlkes, Layered Object Models for Image Segmentation, Trans. PAMI, vol.34, issue.9, pp.1731-1743, 2011.

Y. Yang and D. Ramanan, Articulated Human Detection with Flexible Mixturesof-Parts, Trans. PAMI, 2012.

, Articulated Pose Estimation using Flexible Mixtures of Parts, CVPR, 2011.

B. Yao and L. Fei-fei, Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities, CVPR, 2010.

C. Yao, X. Bai, B. Shi, and W. Liu, Strokelets: A Learned Multi-scale Representation for Scene Text Recognition, CVPR, 2014.

Q. Ye and D. Doermann, Text Detection and Recognition in Imagery: A survey, Trans. PAMI, vol.37, issue.7, pp.1480-1500, 2015.

H. Zhao, C. Gan, A. Rouditchenko, C. Vondrick, J. Mcdermott et al., The Sound of Pixels, ECCV, 2018.

S. Zheng, S. Jayasumana, B. Romera-paredes, V. Vineet, Z. Su et al., Conditional Random Fields as Recurrent Neural Networks, 2015.

C. L. Zitnick and P. Dollár, Edge Boxes: Locating Object Proposals from Edges, ECCV, 2014.

C. L. Zitnick, N. Jojic, and S. B. Kang, Consistent segmentation for optical flow estimation, ICCV, 2005.