N. Aifanti, C. Papachristou, and A. Delopoulos, The mug facial expression database, Workshop on Image analysis for multimedia interactive services (WIAMIS), p.4, 2010.

B. Albahar and J. Huang, Guided image-to-image translation with bi-directional feature transformation, ICCV, 2019.

D. Bau, J. Zhu, H. Strobelt, B. Zhou, J. B. Tenenbaum et al., Gan dissection: Visualizing and understanding generative adversarial networks, ICLR, 2019.

Y. Bengio, A. Courville, and P. Vincent, Representation learning: A review and new perspectives, IEEE TPAMI, vol.35, issue.8, pp.1798-1828, 2013.

A. Brock, J. Donahue, and K. Simonyan, Large scale GAN training for high fidelity natural image synthesis, ICLR, 2019.

A. Bulat and G. Tzimiropoulos, How far are we from solving the 2d & 3d face alignment problem? (and a dataset of 230,000 3d facial landmarks), ICCV, 2017.

C. Chan, S. Ginosar, T. Zhou, and A. A. Efros, Everybody dance now, ICCV, vol.1, 2019.

X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever et al., Infogan: Interpretable representation learning by information maximizing generative adversarial nets, NIPS, 2016.

H. Dibeklioglu, A. A. Salah, and T. Gevers, Are you really smiling at me? spontaneous versus posed enjoyment smiles, ECCV, 2012.

I. Goodfellow, J. Pouget-abadie, M. Mirza, B. Xu, D. Warde-farley et al., Generative adversarial nets, NIPS, vol.1, 2014.

L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri, Actions as space-time shapes, TPAMI, vol.29, issue.12, pp.2247-2253, 2004.

K. Hara, H. Kataoka, and Y. Satoh, Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet, CVPR, 2018.

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, Gans trained by a two time-scale update rule converge to a local nash equilibrium

I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot et al., Shakir Mohamed, and Alexander Lerchner. beta-vae: Learning basic visual con, 2017.

P. Isola, J. Zhu, T. Zhou, and A. A. Efros, Image-to-Image Translation with Conditional Adversarial Networks, CVPR, 2017.

Y. Jang, G. Kim, and Y. Song, Video Prediction with Appearance and Motion Conditions, ICML, 2018.

T. Karras, T. Aila, S. Laine, and J. Lehtinen, Progressive growing of gans for improved quality, stability, and variation, 2017.

T. Karras, S. Laine, and T. Aila, A style-based generator architecture for generative adversarial networks, CVPR, 2019.

P. Diederik, J. Kingma, and . Ba, Adam: A method for stochastic optimization, 2014.

P. Diederik, M. Kingma, and . Welling, Auto-encoding variational bayes, ICLR, 2014.

C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham et al., Photorealistic single image super-resolution using a generative adversarial network, CVPR, 2017.

H. Lee, H. Tseng, J. Huang, M. Singh, and M. Yang, Diverse image-to-image translation via disentangled representations, In ECCV, issue.2, 2018.

Y. Li, C. Fang, J. Yang, Z. Wang, X. Lu et al., Flow-grounded spatial-temporal video prediction from still images, ECCV, vol.1, 2018.

L. Ma, Q. Sun, S. Georgoulis, L. Van-gool, B. Schiele et al., Disentangled person image generation, CVPR, vol.1, 2018.

T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, Spectral normalization for generative adversarial networks, ICLR, 2018.

J. Pan, C. Wang, X. Jia, J. Shao, L. Sheng et al., Video generation from single semantic label map, 2019.

Y. Pu, S. Dai, Z. Gan, W. Wang, G. Wang et al., Jointgan: Multi-domain joint distribution learning with generative adversarial nets, 2018.

A. Radford, L. Metz, and S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks, 2015.

A. Romero, P. Arbeláez, L. Van-gool, and R. Timofte, Smit: Stochastic multi-label image-to-image translation, ICCV Workshops, 2019.

M. Saito, E. Matsumoto, and S. Saito, Temporal generative adversarial nets with singular value clipping, ICCV, vol.2, p.4, 2017.

T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford et al., Improved techniques for training GANs

Z. Md-mahfuzur-rahman-siddiquee, N. Zhou, R. Tajbakhsh, . Feng, B. Michael et al., Learning fixed points in generative adversarial networks: From image-to-image translation to disease detection and localization, ICCV, 2019.

K. K. Singh, U. Ojha, and Y. J. Lee, Finegan: Unsupervised hierarchical disentanglement for fine-grained object generation and discovery, CVPR, 2019.

K. Soomro, M. Amir-roshan-zamir, and . Shah, UCF101: A Dataset of 101 Human Action Classes From Videos in The Wild, 2004.

S. Tulyakov, M. Liu, X. Yang, and J. Kautz, MoCoGAN: Decomposing motion and content for video generation, CVPR, 2018.

C. Vondrick, H. Pirsiavash, and A. Torralba, Generating videos with scene dynamics, NIPS, 2016.

J. Walker, C. Doersch, A. Gupta, and M. Hebert, An uncertain future: Forecasting from static images using variational autoencoders, ECCV, 2016.

J. Walker, K. Marino, A. Gupta, and M. Hebert, The pose knows: Video forecasting by generating pose futures, ICCV, 2017.

T. Wang, M. Liu, J. Zhu, G. Liu, A. Tao et al., Video-tovideo synthesis, In NeurIPS, vol.2, p.5, 2018.

Y. Wang, P. Bilinski, F. Francois, A. Bremond, and . Dantcheva, ImaGINator: Conditional Spatio-Temporal GAN for Video Generation, WACV, vol.2020
URL : https://hal.archives-ouvertes.fr/hal-02368319

T. Xu, P. Zhang, Q. Huang, H. Zhang, Z. Gan et al., Attngan: Finegrained text to image generation with attentional generative adversarial networks, CVPR, 2018.

C. Yang, Z. Wang, X. Zhu, C. Huang, J. Shi et al., Pose guided human video generation, ECCV, vol.1, 2018.

H. Zhang and I. Goodfellow, Dimitris Metaxas, and Augustus Odena. Self-Attention Generative Adversarial Networks, 2018.

B. Zhao, L. Meng, W. Yin, and L. Sigal, Image generation from layout, CVPR, 2019.

L. Zhao, X. Peng, Y. Tian, M. Kapadia, and D. Metaxas, Learning to forecast and refine residual motion for image-to-video generation, In ECCV, issue.2, 2018.