Q. Summerfield, Some preliminaries to a comprehensive account of audio-visual speech perception, " in Hearing by Eye: The Psychology of Lipreading, pp.3-51, 1987.

W. H. Sumby and I. Pollack, Visual Contribution to Speech Intelligibility in Noise, The Journal of the Acoustical Society of America, vol.26, issue.2, pp.212-215, 1954.
DOI : 10.1121/1.1907309

L. Girin, J. Schwartz, and G. Feng, Audio-visual enhancement of speech in noise, The Journal of the Acoustical Society of America, vol.109, issue.6, pp.3007-3020, 2001.
DOI : 10.1121/1.1358887

S. Deligne, G. Potamianos, and C. Neti, Audio-visual speech enhancement with AVCDCN (Audiovisual Codebook Dependent Cepstral Normalization), Proc. Int. Conf. Spoken Language Proc. (ICSLP), pp.1449-1452, 2002.

R. Goecke, G. Potamianos, and C. Neti, Noisy audio feature enhancement using audio-visual speech data, Proc. IEEE Int. Conf. Acoustics, Speech, Signal Proc. (ICASSP), pp.2025-2028, 2002.
DOI : 10.1109/icassp.2002.5745030

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.2650

G. Potamianos, C. Neti, G. Gravier, A. Garg, and A. W. Senior, Recent advances in the automatic recognition of audiovisual speech, Proc. IEEE, pp.1306-1326, 2003.

S. Lucey, T. Chen, S. Sridharan, and V. Chandran, Integration strategies for audio-visual speech processing: applied to text-dependent speaker recognition, IEEE Transactions on Multimedia, vol.7, issue.3, pp.495-506, 2005.
DOI : 10.1109/TMM.2005.846777

D. Sodoyer, L. Girin, C. Jutten, and J. Schwartz, Developing an audio-visual speech source separation algorithm, Speech Communication, vol.44, issue.1-4, pp.113-125, 2004.
DOI : 10.1016/j.specom.2004.10.002

URL : https://hal.archives-ouvertes.fr/hal-00186591

R. Dansereau, Co-channel audiovisual speech separation using spectral matching constraints, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.645-648, 2004.
DOI : 10.1109/ICASSP.2004.1327193

S. Rajaram, A. V. Nefian, and T. Huang, Bayesian separation of audiovisual speech sources, Proc. IEEE Int. Conf. Acoustics, Speech, Signal Proc. (ICASSP), pp.657-660, 2004.

B. Rivet, L. Girin, and C. Jutten, Mixing Audiovisual Speech Processing and Blind Source Separation for the Extraction of Speech Signals From Convolutive Mixtures, IEEE Transactions on Audio, Speech and Language Processing, vol.15, issue.1, pp.96-108, 2007.
DOI : 10.1109/TASL.2006.872619

URL : https://hal.archives-ouvertes.fr/hal-00174100

W. Wang, D. Cosker, Y. Hicks, S. Saneit, and J. Chambers, Video Assisted Speech Source Separation, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., pp.425-428, 2005.
DOI : 10.1109/ICASSP.2005.1416331

G. Chetty and M. Wagner, Audio Visual Speaker Verification Based on Hybrid Fusion of Cross Modal Features, Pattern Recognition and Machine Intelligence (PReMI), pp.469-478, 2007.
DOI : 10.1007/978-3-540-77046-6_58

H. G. Okuno and K. Nakadai, Real-time Sound Source Localization and Separation based on Active Audio-Visual Integration, IWANN (1), ser. Lecture Notes in Computer Science, pp.118-125, 2003.
DOI : 10.1007/3-540-44868-3_16

J. Fritsch, M. Kleinehagenbrock, S. Lang, G. A. Fink, and G. Sagerer, Audiovisual person tracking with a mobile robot, Proc. Int. Conf. on Intelligent Autonomous Systems, pp.898-906, 2004.

C. Saraceno and R. Leonardi, Indexing audiovisual databases through joint audio and video processing, International Journal of Imaging Systems and Technology, vol.9, issue.5, pp.320-331, 1999.
DOI : 10.1002/(SICI)1098-1098(1998)9:5<320::AID-IMA2>3.0.CO;2-C

E. K. Patterson, S. Gurbuz, Z. Tufekci, and J. N. Gowdy, Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus, EURASIP Journal on Advances in Signal Processing, vol.2002, issue.11, p.1189, 2002.
DOI : 10.1155/S1110865702206101

Z. Barzelay and Y. Y. Schechner, Harmony in Motion, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383344

C. Sigg, B. Fischer, B. Ommer, V. Roth, and J. Buhmann, Nonnegative CCA for Audiovisual Source Separation, 2007 IEEE Workshop on Machine Learning for Signal Processing, 2007.
DOI : 10.1109/MLSP.2007.4414315

L. Benaroya and F. Bimbot, Wiener based source separation with HMM/GMM using a single sensor, Proc. 4th Int. Symp. on Independent Component Anal. and Blind Signal Separation (ICA2003), pp.957-961, 2003.

S. Mallat and Z. Zhang, Matching pursuits with time-frequency dictionaries, IEEE Transactions on Signal Processing, vol.41, issue.12, pp.3397-3415, 1993.
DOI : 10.1109/78.258082

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.335.5769

O. Divorra-escoda, G. Monaci, R. Figueras, P. Ventura, M. Vandergheynst et al., Geometric Video Approximation Using Weighted Matching Pursuit, IEEE Transactions on Image Processing, vol.18, issue.8, pp.1703-1716, 2009.
DOI : 10.1109/TIP.2009.2021315

G. Monaci, O. Divorra, and P. Vandergheynst, Analysis of multimodal sequences using geometric video representations, Signal Processing, vol.86, issue.12, pp.3534-3548, 2006.
DOI : 10.1016/j.sigpro.2006.02.044

A. Llagostera-casanovas, G. Monaci, and P. Vandergheynst, Blind audiovisual source separation using sparse redundant representations, 2007.

A. Ozerov, P. Philippe, F. Bimbot, and R. Gribonval, Adaptation of Bayesian Models for Single-Channel Source Separation and its Application to Voice/Music Separation in Popular Songs, IEEE Transactions on Audio, Speech and Language Processing, vol.15, issue.5, pp.1564-1578, 2007.
DOI : 10.1109/TASL.2007.899291

URL : https://hal.archives-ouvertes.fr/inria-00544774

R. Baken and R. Orlikoff, Clinical Measurement of Speech and Voice, 2000.

E. Vincent, C. Fevotte, and R. Gribonval, Performance measurement in blind audio source separation, IEEE Transactions on Audio, Speech and Language Processing, vol.14, issue.4, pp.1462-1469, 2006.
DOI : 10.1109/TSA.2005.858005

URL : https://hal.archives-ouvertes.fr/inria-00544230

E. Vincent, R. Gribonval, and M. D. Plumbley, Oracle estimators for the benchmarking of source separation algorithms, Signal Processing, vol.87, issue.8, pp.1933-1950, 2007.
DOI : 10.1016/j.sigpro.2007.01.016

URL : https://hal.archives-ouvertes.fr/inria-00545156

O. Yilmaz and S. Rickard, Blind Separation of Speech Mixtures via Time-Frequency Masking, IEEE Transactions on Signal Processing, vol.52, issue.7, pp.1830-1847, 2004.
DOI : 10.1109/TSP.2004.828896