X. Alameda-pineda, V. Khalidov, R. P. Horaud, and F. Forbes, Finding audio-visual events in informal social gatherings, Proceedings of the 13th international conference on multimodal interfaces, ICMI '11, pp.247-254, 2011.
DOI : 10.1145/2070481.2070527

URL : https://hal.archives-ouvertes.fr/inria-00623489

C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), 2006.

M. Brookes, VOICEBOX: Speech processing toolbox for MATLAB

W. Jiang, C. Cotton, S. Chang, D. Ellis, and A. Loui, Short-term audio-visual atoms for generic video concept classification, Proceedings of the seventeen ACM international conference on Multimedia, MM '09, 2009.
DOI : 10.1145/1631272.1631277

F. Vasil-khalidov, R. P. Forbes, and . Horaud, Conjugate Mixture Models for Clustering Multimodal Data, Neural Computation, vol.49, issue.3, pp.517-557, 2011.
DOI : 10.1007/978-94-011-3436-1

L. Lacheze, Y. Guo, R. Benosman, B. Gas, and C. Couverture, Audio/video fusion for objects recognition, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2009.
DOI : 10.1109/IROS.2009.5354442

I. Laptev, On space-time interest points, International Journal on Computer Vision, vol.64, issue.2-3, 2005.

M. Liu, Y. Fu, and T. S. Huang, An audio-visual fusion framework with joint dimensionality reduction, Proceedings of the IEEE International Conference on Audio Speech and Signal Processing, 2008.

J. Lopes and S. Singh, Audio and Video Feature Fusion for Activity Recognition in Unconstrained Videos, Intelligent Data Engineering and Automated Learning, 2006.
DOI : 10.1007/11875581_99

J. Luo, B. Caputo, A. Zweig, J. Bach, and J. Anemüller, Object Category Detection Using Audio-Visual Cues, Proceedings of the 6th International Conference on Computer Vision Systems, 2008.
DOI : 10.1007/978-3-540-79547-6_52

R. Lawrence, . Rabiner, W. Ronald, and . Schafer, Theory and Applications of Digital Speech Processing, 2011.

V. Ramasubramanian, R. Karthik, S. Thiyagarajan, and S. Cherla, Continuous audio analytics by HMM and Viterbi decoding, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.2396-2399, 2011.
DOI : 10.1109/ICASSP.2011.5946966

K. Saenko and T. Darrell, Object Category Recognition Using Probabilistic Fusion of Speech and Image Classifiers, Proceedings of the 4th International Conference on Machine Learning for Multimodal Interaction, 2008.
DOI : 10.1007/978-3-540-78155-4_4

J. Sanchez-riera, J. Cech, and R. Horaud, Action recognition robust to background clutter by using stereo vision Scene flow estimation by growing correspondence seeds, 4th International Workshop on Video Event Categorization, Tagging and Retrieval (VECTaR), in conjunction with IEEE European Conference on Computer Vision, 2012. [16] Ja? Cech, Jordi Sanchez-Riera, and Radu P. Horaud Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2011.

Z. Xiong, Audio-visual sports highlights extraction using Coupled Hidden Markov Models, Pattern Analysis and Applications, vol.10, issue.2, pp.62-71, 2005.
DOI : 10.1007/s10044-005-0244-7