M. Aitkin and D. Rubin, Estimation and hypothesis testing in finite mixture models, Journal of the Royal Statistical Society. Series B (Methodological), vol.47, issue.1, pp.67-75, 1985.

P. Allen, Integrating vision and touch for object recognition tasks Multisensor integration and fusion for intelligent machines and systems, pp.407-440, 1995.

T. J. Anastasio, P. E. Patton, and K. E. Belkacem-boussaid, Using Bayes' Rule to Model Multisensory Enhancement in the Superior Colliculus, Neural Computation, vol.53, issue.3, pp.1165-1187, 2000.
DOI : 10.1016/S0079-6123(08)63337-3

M. Beal, N. Jojic, and H. Attias, A graphical model for audiovisual object tracking, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.25, issue.7, pp.828-836, 2003.
DOI : 10.1109/TPAMI.2003.1206512

C. Bishop, Pattern recognition and machine learning, 2006.

R. Boyles, On the convergence of EM algorithms, Journal of the Royal Statistical Society: Series B, vol.45, issue.1, pp.47-50, 1983.

J. Castellanos and J. Tardos, Simultaneous map building and localization for mobile robots: a multisensor fusion approach, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146), 1999.
DOI : 10.1109/ROBOT.1998.677271

G. Celeux, F. Forbes, and N. Peyrard, EM procedures using mean field-like approximations for Markov model-based image segmentation, Pattern Recognition, vol.36, issue.1, pp.131-144, 2003.
DOI : 10.1016/S0031-3203(02)00027-4
URL : https://hal.archives-ouvertes.fr/inria-00072526

N. Checka, K. Wilson, M. Siracusa, and T. Darrell, Multiple person and speaker activity tracking with a particle filter, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.881-884, 2004.
DOI : 10.1109/ICASSP.2004.1327252

Y. Chen and Y. Rui, Real-Time Speaker Tracking Using Particle Filter Sensor Fusion, Proceedings of IEEE, pp.485-494, 2004.
DOI : 10.1109/JPROC.2003.823146

H. Christensen, N. Ma, S. Wrigley, and J. Barker, Integrating pitch and localisation cues at a speech fragment level, Proc. of Interspeech, pp.2769-2772, 2007.

E. Coiras, F. Baralli, and B. Evans, Rigid data association for shallow water surveys, IET Radar, Sonar & Navigation, vol.1, issue.5, pp.354-361, 2007.
DOI : 10.1049/iet-rsn:20070028

D. Comaniciu and P. Meer, Mean shift: a robust approach toward feature space analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.24, issue.5, pp.603-619, 2002.
DOI : 10.1109/34.1000236

A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm (with discussion), Journal of the Royal Statistical Society: Series B, vol.39, issue.1, pp.1-38, 1977.

J. Dibiase, H. Silverman, and M. Brandstein, Robust Localization in Reverberant Rooms, Microphone Arrays: Signal Processing Techniques and Applications, 2001.
DOI : 10.1007/978-3-662-04619-7_8

M. O. Ernst and M. S. Banks, Humans integrate visual and haptic information in a statistically optimal fashion, Nature, vol.415, issue.6870, pp.429-433, 2002.
DOI : 10.1038/415429a

O. D. Faugeras, Three dimensional computer vision: A geometric viewpoint, 1993.

I. Fisher, J. W. Darrell, and T. , Speaker Association With Signal-Level Audiovisual Fusion, IEEE Transactions on Multimedia, vol.6, issue.3, pp.406-413, 2004.
DOI : 10.1109/TMM.2004.827503

I. Fisher, J. W. Darrell, T. Freeman, W. T. Viola, and P. , Learning joint statistical models for audio-visual fusion segregation Advances in neural information processing systems, 14, 2001.

D. Forsyth and J. Ponce, Computer vision: A modern approach, 2003.
URL : https://hal.archives-ouvertes.fr/hal-01063327

D. Gatica-perez, G. Lathoud, J. Odobez, and I. Mccowan, Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings, IEEE Transactions on Audio, Speech and Language Processing, vol.15, issue.2, pp.601-616, 2007.
DOI : 10.1109/TASL.2006.881678

D. L. Hall and S. A. Mcmullen, Mathematical techniques in multisensor data fusion, 2004.

M. Hansard and R. Horaud, Patterns of Binocular Disparity for a Fixating Observer, Proc. of Second International Symposium of Advances in Brain, Vision, and Artificial Intelligence, pp.308-317, 2007.
DOI : 10.1007/978-3-540-75555-5_29
URL : https://hal.archives-ouvertes.fr/inria-00590234

M. Hansard and R. Horaud, Cyclopean geometry of binocular vision, Journal of the Optical Society of America A, vol.25, issue.9, pp.2357-2369, 2008.
DOI : 10.1364/JOSAA.25.002357
URL : https://hal.archives-ouvertes.fr/inria-00435548

C. Harris and M. Stephens, A Combined Corner and Edge Detector, Procedings of the Alvey Vision Conference 1988, pp.147-151, 1988.
DOI : 10.5244/C.2.23

R. Hartley and A. Zisserman, Multiple view geometry in computer vision, 2000.
DOI : 10.1017/CBO9780511811685

S. Haykin and Z. Chen, The Cocktail Party Problem, Neural Computation, vol.31, issue.2, pp.1875-1902, 2005.
DOI : 10.1016/0378-5955(91)90148-3

M. Heckmann, F. Berthommier, and K. Kroschel, Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition, EURASIP Journal on Advances in Signal Processing, vol.2002, issue.11, pp.1260-1273, 2002.
DOI : 10.1155/S1110865702206150

T. Hospedales and S. Vijayakumar, Structure Inference for Bayesian Multisensory Scene Understanding, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.30, issue.12, pp.2140-2157, 2008.
DOI : 10.1109/TPAMI.2008.25

M. Jordan, Z. Ghahramani, T. Jaakkola, and L. Saul, An Introduction to Variational Methods for Graphical Models, Learning in graphical models, pp.105-162, 1998.
DOI : 10.1007/978-94-011-5014-9_5

R. Joshi and A. Sanderson, Multisensor fusion: A minimal representation framework, 1999.
DOI : 10.1142/4106

C. Keribin, Consistent estimation of the order of mixture models, Sankhya: the Indian Journal of Statistics, Series A, vol.62, issue.1, pp.49-66, 2000.

A. J. King, The superior colliculus, Current Biology, vol.14, issue.9, pp.335-338, 2004.
DOI : 10.1016/j.cub.2004.04.018

A. J. King, Multisensory Integration: Strategies for Synchronization, Current Biology, vol.15, issue.9, pp.339-341, 2005.
DOI : 10.1016/j.cub.2005.04.022

A. Kushal, M. Rahurkar, L. Fei-fei, J. Ponce, and T. Huang, Audiovisual speaker localization using graphical models, Proc. of the Eighteenth International Conference on Pattern Recognition, pp.291-294, 2006.

I. Laptev, On space-time interest points, Int. J. Comp. Vis, vol.64, pp.2-3, 2005.

S. Majumder, S. Scheding, and H. Durrant-whyte, Multisensor data fusion for underwater navigation, Robotics and Autonomous Systems, vol.35, issue.2, pp.97-108, 2001.
DOI : 10.1016/S0921-8890(00)00126-3

G. Mclachlan and T. Krishnan, The EM algorithm and extensions, 1996.

G. J. Mclachlan and D. Peel, Finite mixture models, 2000.
DOI : 10.1002/0471721182

H. Mitchell, Multi-sensor data fusion, 2007.

H. Naus and C. Van-wijk, Simultaneous localization of multiple emitters, IEE Proceedings Radar Sonar and Navigation, pp.65-70, 2004.

A. V. Nefian, L. Liang, X. Pi, X. Liu, and K. Murphy, Dynamic Bayesian Networks for Audio-Visual Speech Recognition, EURASIP Journal on Advances in Signal Processing, vol.2002, issue.11, pp.2002-1274, 2002.
DOI : 10.1155/S1110865702206083

P. Perez, J. Vermaak, and A. Blake, Data Fusion for Visual Tracking With Particles, Proceedings of the IEEE, vol.92, issue.3, pp.495-513, 2004.
DOI : 10.1109/JPROC.2003.823147

B. Polyak, Introduction to optimization, 1987.

A. Pouget, S. Deneve, and J. Duhamel, A computational perspective on the neural basis of multisensory spatial representations, Nature Reviews Neuroscience, vol.83, issue.9, pp.741-747, 2002.
DOI : 10.1038/nrn914

B. Quinn, G. Mclachlan, and N. L. Hjort, A note on the Aitkin-Rubin approach to hypothesis testing in mixture models, Journal of the Royal Statistical Society. Series B (Methodological), issue.3, pp.49-311, 1987.

G. Schwarz, Estimating the Dimension of a Model, The Annals of Statistics, vol.6, issue.2, pp.461-464, 1978.
DOI : 10.1214/aos/1176344136

X. Shao and J. Barker, Stream weight estimation for multistream audio???visual speech recognition in a multispeaker environment, Speech Communication, vol.50, issue.4, pp.337-353, 2008.
DOI : 10.1016/j.specom.2007.11.002
URL : https://hal.archives-ouvertes.fr/hal-00499201

D. Smith and S. Singh, Approaches to Multisensor Data Fusion in Target Tracking: A Survey, IEEE Transactions on Knowledge and Data Engineering, vol.18, issue.12, pp.1696-1710, 2006.
DOI : 10.1109/TKDE.2006.183

D. Wang and G. J. Brown, Computational auditory scene analysis: Principles , algorithms, and applications, 2006.
DOI : 10.1109/9780470043387

A. Zhigljavsky, Theory of global random search Stochastic global optimization, 1991.