W. Sumby and I. Pollack, Visual contribution to speech intelligibility in noise, Journal of the Acoustical Society of America, vol.26, p.212, 1954.

L. Goff, T. Guiard-marigny, M. Cohen, and C. Benoit, Real-time analysis-synthesis and intelligibility of talking faces, 2nd International conference on Speech Synthesis, pp.53-56, 1994.

S. Ouni, M. M. Cohen, H. Ishak, and D. W. Massaro, Visual contribution to speech perception: measuring the intelligibility of animated talking heads, EURASIP J. Audio Speech Music Process, issue.1, pp.3-3, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00184425

W. Mattheyses and W. Verhelst, Audiovisual speech synthesis: An overview of the state-of-the-art, Speech Communication, vol.66, pp.182-217, 2015.

N. F. Dixon and L. Spitz, The detection of audiovisual desynchrony, Perception, vol.9, pp.719-721, 1980.

K. P. Green and P. K. Kuhl, The role of visual information in the processing of place and manner features in speech perception, Perception and Psychophysics, vol.45, pp.34-42, 1989.

, Integral processing of visual place and auditory voicing information during phonetic perception, Journal of Experimental Psychology: Human Perception and Performance, vol.17, pp.278-288, 1991.

J. Jiang, A. Alwan, P. Keating, E. Auer, and L. Bernstein, On the relationship between face movements, tongue movements, and speech acoustics, EURASIP Journal on Applied Signal Processing, vol.11, pp.1174-1188, 2002.

H. Mcgurk and J. Macdonald, Hearing lips and seeing voices, Nature, vol.264, pp.746-748, 1976.

W. J. Hardcastle and N. Hewlett, Coarticulation: Theory, data and techniques, 2006.

P. Liu, Q. Yu, Z. Wu, S. Kang, H. Meng et al., A deep recurrent approach for acoustic-to-articulatory inversion, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4450-4454, 2015.

P. Zhu, L. Xie, and Y. Chen, Articulatory movement prediction using deep bidirectional long short-term memory based recurrent neural networks and word/phone embeddings, Proc. of Interspeech. ISCA, pp.2192-2196, 2015.

T. Biasutto-lervat and S. Ouni, Phoneme-to-Articulatory Mapping Using Bidirectional Gated RNN, Proc. of Interspeech. ISCA, pp.3112-3116, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01862587

C. Curio, M. Breidt, M. Kleiner, Q. C. Vuong, M. A. Giese et al., Semantic 3d motion retargeting for facial animation, Proc. of the 3rd symposium on Applied perception in graphics and visualization, pp.77-84, 2006.

Y. Seol, J. Seo, P. H. Kim, J. Lewis, and J. Noh, Artist friendly facial animation retargeting, ACM Transactions on Graphics (TOG), vol.30, issue.6, p.162, 2011.

E. Chuang and C. Bregler, Performance driven facial animation using blendshape interpolation, vol.2, p.3, 2002.

L. Dutreve, A. Meyer, and S. Bouakaz, Feature points based facial animation retargeting, Proc. of the ACM symposium on Virtual reality software and technology, pp.197-200, 2008.
URL : https://hal.archives-ouvertes.fr/hal-01494990

K. Richmond, P. Hoole, and S. King, Announcing the electromagnetic articulography (day 1) subset of the mngu0 articulatory corpus, Proc. of Interspeech. ISCA, 2011.

A. A. Wrench, A multichannel articulatory database and its application for automatic speech recognition, Proceedings 5 th Seminar of Speech Production. Citeseer, 2000.

S. Galliano, E. Geoffrois, G. Gravier, J. Bonastre, D. Mostefa et al., Corpus description of the ester evaluation campaign for the rich transcription of french broadcast news, LREC, pp.139-142, 2006.

Y. Esteve, T. Bazillon, J. Antoine, F. Béchet, and J. Farinas, The epac corpus: Manual and automatic annotations of conversational speech in french broadcast news, LREC. Citeseer, 2010.
URL : https://hal.archives-ouvertes.fr/hal-01433895

G. Gravier, G. Adda, N. Paulson, M. Carré, A. Giraudel et al., The etape corpus for the evaluation of speech-based tv content processing in the french language, LREC-Eighth international conference on Language Resources and Evaluation, p.p. na, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00712591

S. Taylor, T. Kim, Y. Yue, M. Mahler, J. Krahe et al., A deep learning approach for generalized speech animation, ACM Trans. Graph, vol.36, issue.4, pp.1-93, 2017.

B. Fan, L. Xie, S. Yang, L. Wang, and F. K. Soong, A deep bidirectional LSTM approach for video-realistic talking head, Multimedia Tools and Applications, vol.75, pp.5287-5309, 2016.

K. Hornik, M. Stinchcombe, and H. White, Multilayer Feedforward Networks Are Universal Approximators, Neural Netw, vol.2, issue.5, pp.359-366, 1989.

H. Siegelmann and E. Sontag, On the Computational Power of Neural Nets, J. Comput. Syst. Sci, vol.50, issue.1, pp.132-150, 1995.

M. Schuster and K. K. Paliwal, Bidirectional recurrent neural networks, IEEE Transactions on Signal Processing, vol.45, issue.11, pp.2673-2681, 1997.

Y. Bengio, P. Simard, and P. Frasconi, Learning long-term dependencies with gradient descent is difficult, IEEE Transactions on Neural Networks, vol.5, issue.2, pp.157-166, 1994.

S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Computation, vol.9, issue.8, pp.1735-1780, 1997.

K. Cho, B. Van-merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares et al., Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, Proc. of the Conference on Empirical Methods in Natural Language Processing, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01433235

T. Karras, T. Aila, S. Laine, A. Herva, and J. Lehtinen, Audiodriven Facial Animation by Joint End-to-end Learning of Pose and Emotion, ACM Trans. Graph, vol.36, issue.4, pp.1-94, 2017.

C. Benoit, T. Lallouache, T. Mohamadi, and C. Abry, A set of french visemes for visual speech synthesis, pp.485-501, 1992.

O. Govokhina, Modèles de trajectoires pour lanimation de visages parlants, 2008.

J. L. Ba, J. R. Kiros, and G. E. Hinton, Layer normalization, 2016.

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by back-propagating errors, Nature, vol.323, issue.6088, pp.533-536, 1986.

D. P. Kingma and J. Ba, Adam: A Method for Stochastic Optimization, Proc. of the 3rd International Conference for Learning Representations (ICLR), 2015.

M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark et al., Deep contextualized word representations, Proc. of NAACL, 2018.

J. Devlin, M. Chang, K. Lee, and K. Toutanova, Bert: Pretraining of deep bidirectional transformers for language understanding, 2018.