, References

P. Wagner, Z. Malisz, and S. Kopp, Gesture and speech in interaction: An overview, Speech Communication, vol.57, pp.209-232, 2014.
DOI : 10.1016/j.specom.2013.09.008

D. Knight, The future of multimodal corpora, Revista Brasileira de Lingu??stica Aplicada, vol.83, issue.12, pp.391-415, 2011.
DOI : 10.1016/j.sigpro.2006.02.039

A. Czyzewski, B. Kostek, P. Bratoszewski, J. Kotus, and M. Szykulski, An audio-visual corpus for multimodal automatic speech recognition, Journal of Intelligent Information Systems, vol.50, issue.3, pp.167-192, 2017.
DOI : 10.1007/s10579-015-9302-y

B. Lee, M. Hasegawa-johnson, C. Goudeseune, S. Kamdar, S. Borys et al., Avicar: Audio-visual speech corpus in a car environment, International Conference on Spoken Language Processing, pp.2489-2492, 2004.

C. Sanderson and B. C. Lovell, Multi-Region Probabilistic Histograms for Robust and Scalable Identity Inference, pp.199-208, 2009.
DOI : 10.1109/34.598228

J. Trojanová, M. Hrúz, P. Campr, and M. ?. Zelezn´yzelezn´y, Design and recording of czech audio-visual database with impaired conditions for continuous speech recognition, Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), 2008.

C. Mccool, S. Marcel, A. Hadid, M. Pietikainen, P. Matejka et al., Bi-Modal Person Recognition on a Mobile Phone: Using Mobile Phone Data, 2012 IEEE International Conference on Multimedia and Expo Workshops, pp.635-640, 2012.
DOI : 10.1109/ICMEW.2012.116

E. K. Patterson, S. Gurbuz, Z. Tufekci, and J. N. Gowdy, Cuave: A new audio-visual database for multimodal human-computer interface research, IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.2017-2020, 2002.
DOI : 10.1109/icassp.2002.5745028

G. Galatas, G. Potamianos, and F. Makedon, Audio-visual speech recognition incorporating facial depth information captured by kinect, Proceedings of the 20th European Signal Processing Conference (EUSIPCO2012). Bucharest, pp.2714-2717, 2012.
DOI : 10.1145/2413097.2413100

G. Galatas, G. Potamianos, D. Kosmopoulus, C. Mcmurrough, and F. Makedon, Bilingual corpus for avasr using multiple sensors and depth information, Proceedings of the AVSP2011. Bucharest, pp.103-106, 2011.

C. Sui, S. Haque, R. Togneri, and M. Bennamoun, A 3D audiovisual corpus for speech recognition, Proceedings of the 14th Australasian International Conference on Speech Science and Technology, pp.125-128, 2012.

P. Zelasko, B. Zió?ko, T. Jadczyk, and D. Skurzok, AGH corpus of Polish speech, Language Resources and Evaluation, vol.2, issue.1, pp.585-601, 2016.
DOI : 10.5772/16568

Y. Benezeth, G. Bachman, G. Lejan, N. Souviraa-labastie, and F. Bimbot, BL-Database: A French audiovisual database for speech-driven lip animation systems, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00614761

E. Bailly-bailliére, S. Bengio, F. Bimbot, M. Hamouz, J. Kittler et al., The BANCA Database and Evaluation Protocol, pp.625-638, 2003.
DOI : 10.1007/3-540-44887-X_74

D. Schabus, M. Pucher, and G. Hofer, Building a synchronous corpus of acoustic and 3D facial marker data for adaptive audiovisual speech synthesis, Proceedings of the Eight International Conference on Language Resources and Evaluation Istanbul, Turkey: European Language Resources Association (ELRA), pp.3313-3316, 2012.

G. Fanelli, J. Gall, H. Romsdorfer, T. Weise, and L. Van-gool, A 3-D audio-visual corpus of affective communication 3D vision technology for capturing multimodal corpora: Chances and challenges, Proceedings of LREC on Multimodal Corpora, pp.591-598, 2010.

P. J. Besl and H. D. Mckay, A method for registration of 3-D shapes, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.14, issue.2, pp.239-256, 1992.
DOI : 10.1109/34.121791

A. V. Barbosa, R. Dchaine, E. Vatikiotis-bateson, and H. C. Yehia, Quantifying time-varying coordination of multimodal speech signals using correlation map analysis, The Journal of the Acoustical Society of America, vol.131, issue.3, pp.2162-2172, 2012.
DOI : 10.1121/1.3682040

K. Munhall, J. A. Jones, D. E. Callan, T. Kuratate, and E. Vatikiotis-bateson, Visual Prosody and Speech Intelligibility, Psychological Science, vol.21, issue.2, pp.133-137, 2004.
DOI : 10.1016/S0167-6393(98)00048-X
URL : http://psyc.queensu.ca/~munhallk/Munhall_Psyc.Sci.pdf