M. Abadi, TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow

D. F. Fouhey and C. L. Zitnick, Predicting Object Dynamics in Scenes, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.2019-2026, 2014.
DOI : 10.1109/CVPR.2014.260

L. A. Gatys, A. S. Ecker, and M. Bethge, A Neural Algorithm of Artistic Style, Journal of Vision, vol.16, issue.12, 2003.
DOI : 10.1167/16.12.326

D. A. Huang and K. M. Kitani, Action-Reaction: Forecasting the Dynamics of Human Interaction, ECCV, pp.489-504, 2014.
DOI : 10.1007/978-3-319-10584-0_32

J. Johnson, A. Alahi, and L. Fei-fei, Perceptual Losses for Real-Time Style Transfer and Super-Resolution, 2016.
DOI : 10.1007/978-3-319-46475-6_43

D. Kingma and J. Ba, Adam: A method for stochastic optimization, 2003.

K. M. Kitani, B. D. Ziebart, J. A. Bagnell, and M. Hebert, Activity Forecasting, ECCV, pp.201-214, 2012.
DOI : 10.1007/978-3-642-33765-9_15

H. S. Koppula and A. Saxena, Anticipating human activities using object affordances for reactive robotic response, pp.14-29, 2016.
DOI : 10.15607/rss.2013.ix.006

T. Lan, T. C. Chen, and S. Savarese, A Hierarchical Representation for Future Action Prediction, ECCV, pp.689-704, 2014.
DOI : 10.1007/978-3-319-10578-9_45

C. Liu, J. Yuen, and A. Torralba, SIFT Flow: Dense Correspondence Across Scenes and Its Applications, pp.978-994, 2011.
DOI : 10.1007/978-3-319-23048-1_2

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

R. Mottaghi, H. Bagherinezhad, M. Rastegari, and A. Farhadi, Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2002.
DOI : 10.1109/CVPR.2016.383

S. L. Pintea and J. C. Van-gemert, Making a Case for Learning Motion Representations with Phase, 2016.
DOI : 10.1007/978-3-319-10578-9_12

A. Radford, L. Metz, and S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks, 2002.

M. Ranzato, A. Szlam, J. Bruna, M. Mathieu, R. Collobert et al., Video (language) modeling: a baseline for generative models of natural videos, 2014.

S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele et al., Generative adversarial text to image synthesis, 2016.

M. Ruder, A. Dosovitskiy, and T. Brox, Artistic Style Transfer for Videos, 2003.
DOI : 10.1007/978-3-319-45886-1_3

M. Saito and E. Matsumoto, Temporal Generative Adversarial Nets, 2002.

C. Schuldt, I. Laptev, and B. Caputo, Recognizing human actions: a local SVM approach, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., pp.32-36, 2004.
DOI : 10.1109/ICPR.2004.1334462

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller, Striving for simplicity: The all convolutional net, 2014.

M. Tatarchenko, A. Dosovitskiy, and T. Brox, Multi-view 3D Models from Single Images with a Convolutional Network, ECCV, pp.322-337, 2016.
DOI : 10.1007/978-3-642-19391-0_18

A. Van-den-oord, N. Kalchbrenner, O. Vinyals, L. Espeholt, A. Graves et al., Conditional image generation with pixelcnn decoders, 2016.

C. Vondrick, H. Pirsiavash, and A. Torralba, Anticipating the future by watching unlabeled video, 2002.

C. Vondrick, H. Pirsiavash, and A. Torralba, Generating videos with scene dynamics, NIPS, pp.613-621, 2016.

J. Walker, C. Doersch, A. Gupta, and M. Hebert, An Uncertain Future: Forecasting from Static Images Using Variational Autoencoders, ECCV, pp.835-851, 2016.
DOI : 10.1007/978-3-642-15552-9_51

URL : http://arxiv.org/abs/1606.07873

J. Walker, A. Gupta, and M. Hebert, Patch to the Future: Unsupervised Visual Prediction, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.3302-3309
DOI : 10.1109/CVPR.2014.416

J. Walker, A. Gupta, and M. Hebert, Dense Optical Flow Prediction from a Static Image, 2015 IEEE International Conference on Computer Vision (ICCV), pp.2443-2451, 2015.
DOI : 10.1109/ICCV.2015.281

J. Yuen and A. Torralba, A Data-Driven Approach for Event Prediction, ECCV, pp.707-720, 2010.
DOI : 10.1007/978-3-642-15552-9_51