Finding audio-visual events in informal social gatherings, Proceedings of the 13th international conference on multimodal interfaces, ICMI '11, pp.247-254, 2011. ,
DOI : 10.1145/2070481.2070527
URL : https://hal.archives-ouvertes.fr/inria-00623489
Coordinating human-robot communication, 2007. ,
Real-time multimodal humanavatar interaction INRIA Calibration of A Binocular-Binaural Sensor Using an Audio-Visual Target 25, Trans. on Cir.Sys.Video, vol.18, issue.4, pp.467-477, 2008. ,
Audio-visual human recognition using semi-supervised spectral learning and hidden Markov models, Journal of Visual Languages & Computing, vol.20, issue.3, pp.188-195, 2009. ,
DOI : 10.1016/j.jvlc.2009.01.009
Audiovisual Discrimination Between Speech and Laughter: Why and When Visual Information Might Help, IEEE Transactions on Multimedia, vol.13, issue.2, pp.216-234, 2011. ,
DOI : 10.1109/TMM.2010.2101586
Audio-Visual Event Recognition in Surveillance Video Sequences, IEEE Transactions on Multimedia, vol.9, issue.2, pp.257-267, 2007. ,
DOI : 10.1109/TMM.2006.886263
Calibration of Audio-Video Sensors for Multi-Modal Event Indexing, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07, pp.741-744, 2007. ,
DOI : 10.1109/ICASSP.2007.366342
Onsets Coincidence for Cross-Modal Analysis, IEEE Transactions on Multimedia, vol.12, issue.2, pp.108-120, 2010. ,
DOI : 10.1109/TMM.2009.2037387
Blind Audiovisual Source Separation Based on Sparse Redundant Representations, IEEE Transactions on Multimedia, vol.12, issue.5, pp.358-371, 2010. ,
DOI : 10.1109/TMM.2010.2050650
URL : https://hal.archives-ouvertes.fr/inria-00541412
Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings, IEEE Transactions on Audio, Speech and Language Processing, vol.15, issue.2, pp.601-616, 2007. ,
DOI : 10.1109/TASL.2006.881678
Sequential Monte Carlo fusion of sound and vision for speaker tracking, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, pp.741-746, 2001. ,
DOI : 10.1109/ICCV.2001.937600
Data Fusion for Visual Tracking With Particles, Proceedings of IEEE, pp.495-513, 2004. ,
DOI : 10.1109/JPROC.2003.823147
Detecion and localization of 3D audio-visual objects using unsupervised clustering, Proc. of ICMI, 2008. ,
Audiovisual Information Fusion in Human???Computer Interfaces and Intelligent Environments: A Survey, Proceedings of the IEEE, pp.1692-1715, 2010. ,
DOI : 10.1109/JPROC.2010.2057231
Multisensory integration: current issues from the perspective of the single neuron, Nature Reviews Neuroscience, vol.31, issue.4, pp.255-266, 2008. ,
DOI : 10.1016/j.neuron.2007.12.013
Visual influences on auditory spatial learning, Philosophical Transactions of the Royal Society B: Biological Sciences, vol.24, issue.17, pp.331-339, 2009. ,
DOI : 10.1523/JNEUROSCI.0199-04.2004
A graphical model for audiovisual object tracking, Proc. of IEEE Conference on Acoustics, Speech, and Signal Processing, pp.828-836, 2003. ,
DOI : 10.1109/TPAMI.2003.1206512
Structure Inference for Bayesian Multisensory Scene Understanding, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.30, issue.12, pp.2140-2157, 2008. ,
DOI : 10.1109/TPAMI.2008.25
Conjugate Mixture Models for Clustering Multimodal Data, Neural Computation, vol.49, issue.3, pp.517-557, 2011. ,
DOI : 10.1007/978-94-011-3436-1
URL : https://hal.archives-ouvertes.fr/inria-00590267
A joint particle filter for audio-visual speaker tracking, Proceedings of the 7th international conference on Multimodal interfaces , ICMI '05, pp.61-68, 2005. ,
DOI : 10.1145/1088463.1088477
Joint Audio-Visual Tracking Using Particle Filters, EURASIP Journal on Advances in Signal Processing, vol.2002, issue.11, pp.1154-1164, 2002. ,
DOI : 10.1155/S1110865702206058
A Probabilistic Model for Binaural Sound Localization, IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), vol.36, issue.5, pp.982-994, 2006. ,
DOI : 10.1109/TSMCB.2006.872263
Automatic position calibration of multiple microphones, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.69-72, 2004. ,
DOI : 10.1109/ICASSP.2004.1326765
Microphone array position calibration by basis-point classical multidimensional scaling, IEEE Transactions on Speech and Audio Processing, vol.13, issue.5, pp.1025-1034, 2005. ,
DOI : 10.1109/TSA.2005.851893
Affine structure from sound, Proceedings of Conference on Neural Information Processing Systems (NIPS), 2005. ,
Direct computation of sound and microphone locations from time-difference-of-arrival data, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.2445-2448, 2008. ,
DOI : 10.1109/ICASSP.2008.4518142
Self-localizing dynamic microphone arrays, IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), vol.32, issue.4, pp.474-484, 2002. ,
DOI : 10.1109/TSMCB.2002.804369
Maximum-likelihood source localization and unknown sensor location estimation for wideband signals in the near-field, IEEE Transactions on Signal Processing, vol.50, issue.8, pp.1843-1854, 2002. ,
DOI : 10.1109/TSP.2002.800420
Microphone Arrays as Generalized Cameras for Integrated Audio Visual Processing, 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2007. ,
DOI : 10.1109/CVPR.2007.383345
Multiple View Geometry in Computer Vision, 2003. ,
DOI : 10.1017/CBO9780511811685
Coordinate-free calibration of an acoustically driven camera pointing system, 2008 Second ACM/IEEE International Conference on Distributed Smart Cameras, pp.1-9, 2008. ,
DOI : 10.1109/ICDSC.2008.4635685
Harmony in Motion, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007. ,
DOI : 10.1109/CVPR.2007.383344
The cocktail party robot, Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction, HRI '12, 2012. ,
DOI : 10.1145/2157689.2157834
URL : https://hal.archives-ouvertes.fr/hal-00768668
Introduction to Stochastic Searchand Optimization: Estimation, Simulation and Control, 2003. ,
The EM Algorithm and Extensions, 2007. ,
Integrating pitch and localisation cues at a speech fragment level, Proc. of Interspeech, pp.2769-2772, 2007. ,