P. Luc, N. Neverova, C. Couprie, J. Verbeek, and Y. Lecun, Predicting deeper into the future of semantic segmentation, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01494296

R. Sutton and A. Barto, Reinforcement learning: An introduction, 1998.

M. Mathieu, C. Couprie, and Y. Lecun, Deep multi-scale video prediction beyond mean square error, 2016.

M. Ranzato, A. Szlam, J. Bruna, M. Mathieu, R. Collobert et al., Video (language) modeling: a baseline for generative models of natural videos, 2014.

N. Srivastava, E. Mansimov, and R. Salakhutdinov, Unsupervised learning of video representations using LSTMs, 2015.

N. Kalchbrenner, A. Van-den-oord, K. Simonyan, I. Danihelka, O. Vinyals et al., Video pixel networks. In: ICML, 2017.

S. Shalev-shwartz, N. Ben-zrihem, A. Cohen, and A. Shashua, Long-term planning by short-term prediction, 2016.

S. Shalev-shwartz and A. Shashua, On the sample complexity of end-to-end training vs. semantic abstraction training, 2016.

K. He, G. Gkioxari, P. Dollár, and R. Girshick, Mask R-CNN. In: ICCV, 2017.

I. Kokkinos, Ubernet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory, 2017.

R. Villegas, J. Yang, S. Hong, X. Lin, and H. Lee, Decomposing motion and content for natural video sequence prediction, 2017.

R. Villegas, J. Yang, Y. Zou, S. Sohn, X. Lin et al., Learning to generate long-term future via hierarchical prediction, 2017.

J. Walker, C. Doersch, A. Gupta, and M. Hebert, An uncertain future: Forecasting from static images using variational autoencoders, 2016.

T. Lan, T. C. Chen, and S. Savarese, A hierarchical representation for future action prediction, 2014.

K. Kitani, B. Ziebart, J. Bagnell, and M. Hebert, Activity forecasting. In: ECCV, 2012.

N. Lee, W. Choi, P. Vernaza, C. Choy, P. Torr et al., DESIRE: distant future prediction in dynamic scenes with interacting agents, 2017.

A. Dosovitskiy and V. Koltun, Learning to act by predicting the future, 2017.

C. Vondrick, H. Pirsiavash, and A. Torralba, Anticipating the future by watching unlabeled video, CVPR, 2016.

X. Jin, H. Xiao, X. Shen, J. Yang, Z. Lin et al., Predicting scene parsing and motion dynamics in the future, 2017.

B. Romera-paredes and P. Torr, Recurrent instance segmentation, ECCV, 2016.

M. Bai and R. Urtasun, Deep watershed transform for instance segmentation, 2017.

P. Pinheiro, T. Y. Lin, R. Collobert, and P. Dollár, Learning to refine object segments, 2016.

S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: towards real-time object detection with region proposal networks, In: NIPS, 2015.

T. Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan et al., Feature pyramid networks for object detection, 2017.

M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler et al., The Cityscapes dataset for semantic urban scene understanding, 2016.

T. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick et al., Microsoft COCO: common objects in context. In: ECCV, 2014.

A. Yang, J. Wright, Y. Ma, and S. Sastry, Unsupervised segmentation of natural images via lossy data compression, CVIU, vol.110, issue.2, pp.212-225, 2008.

C. Parntofaru and M. Hebert, A comparison of image segmentation algorithms, 2005.

D. Martin, C. Fowlkes, D. Tal, and J. Malik, A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics, 2001.

M. Meil?ameil?a, Comparing clusterings: An axiomatic view, 2005.

F. Yu and V. Koltun, Multi-scale context aggregation by dilated convolutions, 2016.

I. Goodfellow, J. Pouget-abadie, M. Mirza, B. Xu, D. Warde-farley et al., Generative adversarial nets. In: NIPS, 2014.

D. Kingma and M. Welling, Auto-encoding variational Bayes, 2014.

G. Gkioxari and J. Malik, Finding action tubes, CVPR, 2015.

Q. Chen and V. Koltun, Full flow: Optical flow estimation by global optimization over regular grids, 2016.