K. Boakye, B. Trueba-hornero, O. Vinyals, and G. Friedland, Overlapped speech detection for improved speaker diarization in multiparty meetings, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.4353-4356, 2008.
DOI : 10.1109/ICASSP.2008.4518619

J. Carletta, S. Ashby, S. Bourban, M. Flynn, M. Guillemot et al., The AMI Meeting Corpus: A Pre-announcement, Joint Workshop on Machine Learning and Multimodal Interaction (MLMI), 2005.
DOI : 10.1007/11677482_3

T. Chen and R. Rao, Cross-modal Prediction in Audio-visual Communication, International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.2056-2059, 1996.

Y. Huang, O. Vinyals, G. Friedland, C. Müller, N. Mirghafori et al., A fast-match approach for robust, faster than real-time speaker diarization, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU), 2007.
DOI : 10.1109/ASRU.2007.4430196

H. Hung, Y. Huang, C. Yeo, and D. Gatica-perez, Associating audio-visual activity cues in a dominance estimation framework, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2008.
DOI : 10.1109/CVPRW.2008.4563178

J. W. Iii and T. Darrell, Speaker association with signal-level audiovisual fusion, IEEE Transactions on Multimedia, vol.6, issue.3, pp.406-413, 2004.

J. W. Iii, T. Darrell, W. T. Freeman, and P. A. Viola, Learning joint statistical models for audio-visual fusion and segregation, Conference on Neural Information Processing Systems (NIPS), pp.772-778, 2000.

S. J. Mckenna, S. Gong, and Y. Raja, MODELLING FACIAL COLOUR AND IDENTITY WITH GAUSSIAN MIXTURES, Pattern Recognition, vol.31, issue.12, pp.311883-1892, 1998.
DOI : 10.1016/S0031-3203(98)00066-1

H. J. Nock, G. Iyengar, and C. Neti, Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study, ACM International Conference on Image and Video Retrieval (CIVR), pp.488-499, 2003.
DOI : 10.1007/3-540-45113-7_48

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.68.8423

A. Noulas and B. J. Krose, On-line multi-modal speaker diarization, Proceedings of the ninth international conference on Multimodal interfaces , ICMI '07, pp.350-357, 2007.
DOI : 10.1145/1322192.1322254

R. Rao and T. Chen, Exploiting audio-visual correlation in coding of talking head sequences. International Picture Coding Symposium, 1996.

D. A. Reynolds, Speaker identification and verification using Gaussian mixture speaker models, Speech Communication, vol.17, issue.1-2, pp.91-108, 1995.
DOI : 10.1016/0167-6393(95)00009-D

D. A. Reynolds and P. Torres-carrasquillo, Approaches and Applications of Audio Diarization, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., 2005.
DOI : 10.1109/ICASSP.2005.1416463

M. Siracusa and J. Fisher, Dynamic Dependency Tests for Audio-Visual Speaker Association, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07, 2007.
DOI : 10.1109/ICASSP.2007.366271

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.143.6887

S. Tamura, K. Iwano, and S. Furui, Multi-Modal Speech Recognition Using Optical-Flow Analysis for Lip Images, Real World Speech Processing, 2004.

H. Vajaria, T. Islam, S. Sarkar, R. Sankar, and R. Kasturi, Audio Segmentation and Speaker Localization in Meeting Videos, 18th International Conference on Pattern Recognition (ICPR'06), pp.1150-1153, 2006.
DOI : 10.1109/ICPR.2006.283

O. Vinyals and G. Friedland, Towards Semantic Analysis of Conversations: A System for the Live Identification of Speakers in Meetings, 2008 IEEE International Conference on Semantic Computing, 2008.
DOI : 10.1109/ICSC.2008.58

C. Wooters and M. Huijbregts, The ICSI RT07s Speaker Diarization System, Proceedings of the Rich Transcription 2007 Meeting Recognition Evaluation Workshop, 2007.
DOI : 10.1007/978-3-540-68585-2_47

C. Yeo and K. Ramchandran, Compressed domain video processing of meetings for activity estimation in dominance classification and slide transition detection, EECS Dept, 2008.

C. Zhang, P. Yin, Y. Rui, R. Cutler, and P. Viola, Boosting-Based Multimodal Speaker Detection for Distributed Meetings, 2006 IEEE Workshop on Multimedia Signal Processing, 2006.
DOI : 10.1109/MMSP.2006.285274