S. Marcel, M. Nixon, and S. Li, Handbook of Biometric Anti-Spoofing: Trusted Biometrics Under Spoofing Attacks, 2014.

H. Bredin and G. Chollet, Making talking-face authentication robust to deliberate imposture, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.1693-1696, 2008.
DOI : 10.1109/icassp.2008.4517954

URL : https://hal.archives-ouvertes.fr/hal-01987825

R. Rodrigues, L. Ling, and V. Govindaraju, Robustness of multimodal biometric fusion methods against spoof attacks, Journal of Visual Languages and Computing (JVLC), vol.20, pp.169-179, 2009.
DOI : 10.1016/j.jvlc.2009.01.010

Z. Boulkenafet, J. Komulainen, and A. Hadid, Face spoofing detection using colour texture analysis, IEEE Transactions on Information Forensics and Security, vol.11, issue.8, pp.1818-1830, 2016.
DOI : 10.1109/tifs.2016.2555286

Z. Wu, N. Evans, T. Kinnunen, J. Yamagishi, F. Alegre et al., Spoofing and countermeasures for speaker verification: A survey, Speech Communication, vol.66, issue.1, pp.130-153, 2015.
DOI : 10.1016/j.specom.2014.10.005

URL : http://cs.joensuu.fi/pages/tkinnu/webpage/pdf/spoofingsurvey2014_SPCOM.pdf

E. Argones-rúa, H. Bredin, C. García-mateo, G. Chollet, and D. González-jiménez, Audio-visual speech asynchrony detection using co-inertia analysis and coupled hidden Markov models, Pattern Analysis and Applications, vol.12, issue.3, pp.271-284, 2009.

E. Boutellaa, Z. Boulkenafet, J. Komulainen, and A. Hadid, Audiovisual synchrony assessment for replay attack detection in talking face biometrics, Multimedia Tools and Applications, vol.1, p.3, 2016.
DOI : 10.1007/s11042-015-2848-2

N. Eveno and L. Besacier, A speaker independent "liveness" test for audio-visual biometrics, INTERSPEECH, vol.1, 2005.
DOI : 10.1109/ispa.2005.195419

M. Slaney and M. Covell, Facesync: A linear operator for measuring synchronization of video facial images and audio tracks, Neural Information Processing Systems (NIPS), vol.1, pp.814-820, 2000.

K. Kollreider, H. Fronthaler, M. I. Faraj, and J. Bigun, Realtime face detection and motion analysis with application in "liveness" assessment, IEEE Transactions on Information Forensics and Security, vol.2, issue.3, pp.548-558, 2007.

A. Melnikov, R. Akhunzyanov, O. Kudashev, and E. Luckyanets, Audiovisual Liveness Detection, International Conference on Image Analysis and Processing (ICIAP), 2015.

J. Komulainen, I. Anina, J. Holappa, E. Boutellaa, and A. Hadid, On the robustness of audiovisual liveness detection to visual speech animation, IEEE International Conference on Biometrics: Theory, Applications, and Systems (BTAS), 2004.

G. Andrew, R. Arora, J. Bilmes, and K. Livescu, Deep canonical correlation analysis, International Conference on Machine Learning (ICML), vol.1, pp.1247-1255, 2013.

S. Davis and P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.28, issue.4, pp.357-366, 1980.

G. Chetty and M. Wagner, Liveness detection using crossmodal correlations in face-voice person authentication, INTERSPEECH, issue.2, 2005.

K. Kumar, J. Navratil, E. Marcheret, V. Libal, and G. Potamianos, Robust audio-visual speech synchrony detection by generalized bimodal linear prediction, INTERSPEECH, issue.2, 2009.

A. Aides and H. Aronowitz, Text-dependent audiovisual synchrony detection for spoofing detection in mobile person recognition, INTERSPEECH, vol.2, p.4, 2016.

J. Hershey and J. Movellan, Audio vision: Using audiovisual synchrony to locate sounds, Neural Information Processing Systems (NIPS), pp.813-819, 1999.

G. Chetty, Biometric liveness detection based on cross modal fusion, Proc. of 12th International Conference on Information Fusion, pp.2255-2262, 2009.

J. S. Chung and A. Zisserman, Out of time: Automated lip sync in the wild, Computer Vision-ACCV, 2016.

. Workshops and . Springer, , pp.251-263, 2017.

A. Torfi, S. M. Iranmanesh, N. Nasrabadi, and J. Dawson, 3d convolutional neural networks for cross audio-visual matching recognition, IEEE Access, vol.5, issue.2, pp.22-081, 2017.

E. Marcheret, G. Potamianos, J. Vopicka, and V. Goel, Detecting Audio-Visual Synchrony Using Deep Neural Networks, INTERSPEECH, vol.2, p.4, 2015.

T. Kobayashi and N. Otsu, Motion recognition using local auto-correlation of space-time gradients, Pattern Recognition Letters, vol.33, issue.9, pp.1188-1195, 2012.

G. Zhao, M. Pietikäinen, and A. Hadid, Local spatiotemporal descriptors for visual recognition of spoken phrases, ACM International Workshop on Human-centered Multimedia (HCM), pp.57-66, 2007.

J. Sohn, N. Kim, and W. Sung, A statistical model-based voice activity detection, IEEE Signal Processing Letters, vol.6, issue.1, pp.1-3, 1999.

F. K. Soong and A. E. Rosenberg, On the use of instantaneous and transitional spectral information in speaker recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.36, issue.6, pp.871-879, 1988.

D. Hardoon, S. Szedmak, and J. Shawe-taylor, Canonical correlation analysis: An overview with application to learning methods, Neural computation, vol.16, issue.12, pp.2639-2664, 2004.

K. Messer, XM2VTSDB: The Extended M2VTS Database, AVBPA, 1999.