N. Aifanti, C. Papachristou, and A. Delopoulos, The mug facial expression database, Image analysis for multimedia interactive services (WIAMIS), 2010 11th international workshop on, pp.1-4, 2010.

B. Amos, B. Ludwiczuk, and M. Satyanarayanan, Openface: A general-purpose face recognition library with mobile applications, 2016.

A. Brock, J. Donahue, and K. Simonyan, Large scale GAN training for high fidelity natural image synthesis, ICLR, 2019.

Y. Choi, M. Choi, M. Kim, J. Ha, S. Kim et al., Stargan: Unified generative adversarial networks for multidomain image-to-image translation, CVPR, 2018.

E. Denton and R. Fergus, Stochastic video generation with a learned prior, ICML, 2018.

E. L. Denton and V. Birodkar, Unsupervised Learning of Disentangled Representations from Video, NIPS, 2017.

H. Dibeklioglu, A. A. Salah, and T. Gevers, Are you really smiling at me? spontaneous versus posed enjoyment smiles, ECCV, 2012.

C. Finn, I. Goodfellow, and S. Levine, Unsupervised learning for physical interaction through video prediction, NIPS, 2016.

I. Goodfellow, J. Pouget-abadie, M. Mirza, B. Xu, D. Warde-farley et al., Generative adversarial nets, NIPS, 2014.

L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri, Actions as space-time shapes, TPAMI, vol.29, issue.12, pp.2247-2253, 2007.

K. Hara, H. Kataoka, and Y. Satoh, Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? In CVPR, 2018.

M. Hoai, F. De-la, and T. , Max-margin early event detectors, IJCV, vol.107, issue.2, pp.191-202, 2014.

P. Isola, J. Zhu, T. Zhou, and A. A. Efros, Image-to-Image Translation with Conditional Adversarial Networks, CVPR, 2017.

Y. Jang, G. Kim, and Y. Song, Video Prediction with Appearance and Motion Conditions, In ICML, 2018.

T. Kaneko, K. Hiramatsu, and K. Kashino, Generative attribute controller with conditional filtered generative adversarial networks, CVPR, 2017.

T. Karras, S. Laine, and T. Aila, A style-based generator architecture for generative adversarial networks, CVPR, 2019.

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, 2014.

D. P. Kingma and M. Welling, Auto-encoding variational bayes, ICLR, 2014.

T. Lan, T. Chen, and S. Savarese, A hierarchical representation for future action prediction, ECCV, 2014.

J. Li, X. Liang, Y. Wei, T. Xu, J. Feng et al., Perceptual generative adversarial networks for small object detection, CVPR, 2017.

Y. Li, C. Fang, J. Yang, Z. Wang, X. Lu et al., Flow-grounded spatial-temporal video prediction from still images, ECCV, 2018.

X. Liang, L. Lee, W. Dai, and E. P. Xing, Dual motion gan for future-flow embedded video prediction, 2017.

J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.3431-3440, 2015.

P. Luc, N. Neverova, C. Couprie, J. Verbeek, and Y. Lecun, Predicting deeper into the future of semantic segmentation, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01494296

M. Mathieu, C. Couprie, and Y. Lecun, Deep Multi-Scale Video Prediction Beyond Mean Square Error, ICLR, 2016.

M. Mirza and S. Osindero, Conditional generative adversarial nets, 2014.

T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, Spectral normalization for generative adversarial networks, In ICLR, 2018.

T. Miyato and M. Koyama, cGANs with projection discriminator, ICLR, 2018.

A. Odena, C. Olah, and J. Shlens, Conditional Image Synthesis With Auxiliary Classifier GANs, ICML, 2017.

S. L. Pintea, J. C. Van-gemert, and A. W. Smeulders, ECCV, 2014.

A. Radford, L. Metz, and S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks, 2015.

F. A. Reda, G. Liu, K. J. Shih, R. Kirby, J. Barker et al.,

M. S. Ryoo, Human activity prediction: Early recognition of ongoing activities from streaming videos, ICCV, 2011.

Y. Song, D. Demirdjian, and R. Davis, Tracking Body and Hands For Gesture Recognition: NATOPS Aircraft Handling Signals Database, FG, 2011.

I. Sutskever, O. Vinyals, and Q. V. Le, Sequence to sequence learning with neural networks, NIPS, 2014.

D. Tran, H. Wang, L. Torresani, J. Ray, Y. Lecun et al., A closer look at spatiotemporal convolutions for action recognition, CVPR, 2018.

S. Tulyakov, M. Liu, X. Yang, and J. Kautz, MoCoGAN: Decomposing motion and content for video generation, CVPR, 2018.

R. Villegas, J. Yang, S. Hong, X. Lin, and H. Lee, Decomposing motion and content for natural video sequence prediction, 2017.

C. Vondrick, H. Pirsiavash, and A. Torralba, Generating videos with scene dynamics, NIPS, 2016.

J. Walker, C. Doersch, A. Gupta, and M. Hebert, An uncertain future: Forecasting from static images using variational autoencoders, ECCV, 2016.

J. Walker, A. Gupta, and M. Hebert, Patch to the future: Unsupervised visual prediction, CVPR, 2014.

J. Walker, K. Marino, A. Gupta, and M. Hebert, The pose knows: Video forecasting by generating pose futures, 2017.

T. Wang, M. Liu, J. Zhu, G. Liu, A. Tao et al., Video-to-video synthesis, In NeurIPS, 2018.

N. Wichers, R. Villegas, D. Erhan, and H. Lee, Hierarchical long-term video prediction without supervision, ICML, 2018.

T. Xue, J. Wu, K. Bouman, and B. Freeman, Visual dynamics: Probabilistic future frame synthesis via cross convolutional networks, NIPS, 2016.

C. Yang, Z. Wang, X. Zhu, C. Huang, J. Shi et al., Pose guided human video generation, ECCV, 2018.

J. Yuen and A. Torralba, A data-driven approach for event prediction, ECCV, 2010.

L. Zhao, X. Peng, Y. Tian, M. Kapadia, and D. Metaxas, Learning to forecast and refine residual motion for image-tovideo generation, In ECCV, 2018.

L. Zhao, X. Peng, Y. Tian, M. Kapadia, and D. Metaxas, Learning to forecast and refine residual motion for image-tovideo generation, In ECCV, 2018.

J. Zhu, T. Park, P. Isola, and A. A. Efros, Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.