J. Barker and F. Berthommier, Evidence of correlation between acoustic and visual features of speech (International Phonetic Association (IPA), Proceedings of the 14th International Congress of Phonetic Sciences (ICPhS '99, pp.1-7, 1999.

H. Yehia, E. Rubin, and . Vatikiotis-bateson, Quantitative association of vocal-tract and facial behavior, Speech Communication, vol.26, issue.1-2, pp.23-43, 1998.
DOI : 10.1016/S0167-6393(98)00048-X

W. Sumby and I. Pollack, Visual Contribution to Speech Intelligibility in Noise, The Journal of the Acoustical Society of America, vol.26, issue.2, p.212, 1954.
DOI : 10.1121/1.1907309

L. Goff, T. Guiard-marigny, M. Cohen, and C. Benoit, Real-time analysis-synthesis and intelligibility of talking faces, 2nd International Conference on Speech Synthesis, pp.53-56, 1994.

S. Ouni, H. Cohen, D. Ishak, and . Massaro, Visual contribution to speech perception: measuring the intelligibility of animated talking heads. EURASIP J. Audio Speech Music Process, pp.3-347891, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00184425

G. Bailly, . Bérar, . Elisei, and . Odisio, Audiovisual speech synthesis, International Journal of Speech Technology, vol.6, issue.4, pp.331-346, 2003.
DOI : 10.1023/A:1025700715107

URL : https://hal.archives-ouvertes.fr/hal-00169556

B. Theobald, Audiovisual speech synthesis (International Phonetic Association (IPA), Proceedings of the International Congress on Phonetic Sciences, pp.6-10, 2007.

K. Liu and J. Ostermann, Optimization of an image-based talking head system. EURASIP J. Audio Speech Music Process, pp.17419210-1155174192, 2009.

J. Edge, . Hilton, and . Jackson, Model-based synthesis of visual speech movements from 3D video. EURASIP J. Audio Speech Music Process, pp.10-1155597267, 2009.

N. Dixon and L. Spitz, The detection of audiovisual desynchrony. Perception, pp.719-721, 1980.

K. Green, The role of visual information in the processing of, Perception & Psychophysics, vol.62, issue.1, pp.34-42, 1989.
DOI : 10.3758/BF03208030

K. Green, Integral processing of visual place and auditory voicing information during phonetic perception., Journal of Experimental Psychology: Human Perception and Performance, vol.17, issue.1, pp.278-288, 1991.
DOI : 10.1037/0096-1523.17.1.278

J. Jiang, P. Alwan, . Keating, L. Et-auer, and . Bernstein, On the importance of audiovisual coherence for the perceived quality of synthesized visual speech, EURASIP J. Appl. Signal Process, vol.11, pp.1174-1188, 2002.

H. Mcgurk and J. Macdonald, Hearing lips and seeing voices, Nature, vol.65, issue.5588, pp.746-748, 1976.
DOI : 10.1038/264746a0

W. Mattheyses, W. Latacz, and . Verhelst, On the importance of audiovisual coherence for the perceived quality of synthesized visual speech. EURASIP J. Audio Speech Music Process, pp.16981910-1155, 2009.

A. Hunt and . Black, Unit selection in a concatenative speech synthesis system using a large speech database, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, pp.7-10, 1996.
DOI : 10.1109/ICASSP.1996.541110

P. Taylor, Text-to-Speech Synthesis, 2009.
DOI : 10.1017/CBO9780511816338

M. Tamura, . Kondo, T. Masuko, and . Kobayashi, Text-to-audio-visual speech synthesis based on parameter generation from HMM, Proceedings of the Eurospeech Conference, pp.5-9, 1999.

S. Minnis and . Breen, Modeling visual coarticulation in synthetic talking heads using a lip motion unit inventory with concatenative synthesis, Proceedings of the Interspeech, pp.16-20, 2000.

S. Fagel, Joint audio-visual units selection -the JAVUS speech synthesizer, Proceedings of the International Conference on Speech and Computer, 2006.

A. Toutios, . Musti, . Ouni, . Colotte, M. Wrobel-dautcourt et al., Setup for acoustic-visual speech synthesis by concatenating bimodal units, Interspeech 2010, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00526766

S. Maeda-hardcastle, Compensatory articulation during speech: evidence from the analysis and synthesis of vocal-tract shapes using an articulatory model (Kluwer Academic, pp.131-149, 1990.

R. Clark, . Richmond, and . King, Multisyn: Open-domain unit selection for the Festival speech synthesis system, Speech Communication, vol.49, issue.4, pp.317-330, 2007.
DOI : 10.1016/j.specom.2007.01.014

URL : https://hal.archives-ouvertes.fr/hal-00499177

V. Colotte and . Lafosse, Soja: french text-to-speech synthesis system, 2013.

V. Colotte and . Beaufort, Linguistic features weighting for a text-to-speech system without prosody model, Interspeech Proceedings, pp.4-8, 2005.
URL : https://hal.archives-ouvertes.fr/hal-00012561

U. Musti, . Colotte, S. Toutios, and . Ouni, Introducing visual target cost within an acoustic-visual unit-selection speech synthesizer (Volterra, International Conference on Auditory-Visual Speech Processing -AVSP2011, p.31, 2011.

V. Robert, Y. Wrobel-dautcourt, . Laprie, and . Bonneau, Inter speaker variability of labial coarticulation with the view of developing a formal coarticulation model for French, 5th Conference on Auditory-Visual Speech Processing -AVSP 2005, pp.24-27, 2005.
URL : https://hal.archives-ouvertes.fr/inria-00000575

A. Toutios, . Musti, V. Ouni, and . Colotte, Weight optimization for bimodal unit-selection talking head synthesis, 12th Annual Conference of the International Speech Communication Association -Interspeech 2011 ISCA, pp.27-31, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00602407

E. Moulines and F. Charpentier, Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones, Speech Communication, vol.9, issue.5-6, pp.5-6, 1990.
DOI : 10.1016/0167-6393(90)90021-Z

V. Colotte and Y. Laprie, Higher precision pitch marking for, XI European Signal Processing Conference -EUSIPCO 2002, pp.3-6, 2002.
URL : https://hal.archives-ouvertes.fr/inria-00107610

B. Weiss, C. Kühnel, . Wechsung, and . Möller, Web-Based Evaluation of Talking Heads: How Valid Is It?, IVA '09 Proceedings of the 9th International Conference on Intelligent Virtual Agents, pp.14-16, 2009.
DOI : 10.1007/978-3-642-04380-2_87

S. Ouni, D. Mm-cohen, and . Massaro, Training Baldi to be multilingual: A case study for an Arabic Badr, Speech Communication, vol.45, issue.2, pp.115-137, 2005.
DOI : 10.1016/j.specom.2004.11.008

URL : https://hal.archives-ouvertes.fr/hal-00008688

M. Mori, The uncanny valley, Energy, vol.7, pp.33-3510, 1970.

. Ouni, Acoustic-visual synthesis technique using bimodal unit-selection, EURASIP Journal on Audio, Speech, and Music Processing, vol.2013, issue.1, p.16, 2013.
DOI : 10.1016/j.specom.2004.11.008

URL : https://hal.archives-ouvertes.fr/hal-00835854