T. Chen and R. Rao, Audio-visual integration in multimodal communication, Proceedings of IEEE, Special Issue on Multimedia Signal Processing, pp.837-852, 1998.

H. Elina, M. Gabbouj, J. Nurminen, H. Siln, and V. Popa, Speech enhancement, modeling and recognition algorithms and applications, p.2012

T. Weise, Face/Off, Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA '09, 2009.
DOI : 10.1145/1599470.1599472

E. Helander and J. Nurminen, On the importance of pure prosody in the perception of speaker identity, Proc. of Interspeech, 2007.

H. Kuwabara and H. Sagisak, Acoustic characteristics of speaker individuality: Control and conversion, Speech communication 16, pp.165-173, 1995.
DOI : 10.1016/0167-6393(94)00053-D

R. Aihara, GMM-Based Emotional Voice Conversion Using Spectrum and Prosody Features, American Journal of Signal Processing, vol.2, issue.5, pp.134-138, 2012.
DOI : 10.5923/j.ajsp.20120205.06

J. Tao, Y. Kang, and A. Li, Prosody conversion from neutral speech to emotional speech, Audio, Speech, and Language Processing , IEEE Transactions on 14, pp.1145-1154, 2006.

E. D. Petajan, Automatic lipreading to enhance speech recognition, Proc. IEEE Global Telecommunication Conf, 1984.

R. Frischholz and U. Dieckmann, BioID: a multimodal biometric identication system, J. IEEE Comput, vol.33, issue.2, p.6468, 2000.

E. Erzin, Y. Yemez, and A. Tekalp, Multimodal speaker identication using an adaptive classier cascade based on modality reliability, IEEE Trans. Multimedia, vol.7, issue.840852, 2005.

Y. Stylianou, O. Capp, and E. Moulines, Continuous probabilistic transform for voice conversion, Speech and Audio Processing, pp.131-142, 1998.
DOI : 10.1109/89.661472

A. Kain and M. W. Macon, Spectral voice conversion for textto-speech synthesis, Acoustics, Speech and Signal Processing Proceedings of the 1998 IEEE International Conference on, 1998.

T. Toda, A. W. Black, and K. Tokuda, Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory, Audio, Speech, and Language Processing, pp.2222-2235, 2007.
DOI : 10.1109/TASL.2007.907344

G. Fanelli, J. Gall, H. Romsdorfer, T. Weise, and L. Van-gool, A 3-D Audio-Visual Corpus of Affective Communication, IEEE Transactions on Multimedia, vol.12, issue.6, pp.591-598, 2010.
DOI : 10.1109/TMM.2010.2052239

H. Kawahara, STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds, Acoustical Science and Technology, vol.27, issue.6, pp.349-353, 2006.
DOI : 10.1250/ast.27.349

G. Bailly, Lip-Synching Using Speaker-Specific Articulation, Shape and Appearance Models, EURASIP Journal on Audio, Speech, and Music Processing, vol.23, issue.6, p.5, 2009.
DOI : 10.1109/TSA.2005.857572
URL : https://hal.archives-ouvertes.fr/hal-00447061

L. Revret, G. Bailly, and P. Badin, MOTHER: a new generation of talking heads providing a flexible articulatory control for videorealistic speech animation, 2000.