H. Nock, G. Iyengar, and C. Neti, Speaker localization using audiovisual synchrony: An empirical study, Proc. CIVR, 2003.
DOI : 10.1007/3-540-45113-7_48

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.68.8423

Z. Li, T. Herfet, M. Grochulla, and T. Thormahlen, Audio-visual multiple active speaker localization in reverberant environments, Proc. Int. Conference on Digital Audio Effects, 2012.

E. Kidron, Y. Schechner, and M. Elad, Pixels that Sound, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005.
DOI : 10.1109/CVPR.2005.274

P. Aarabi and S. Zaky, Robust sound localization using multi-source audiovisual information fusion, Information Fusion, vol.2, issue.3, 2001.
DOI : 10.1016/S1566-2535(01)00035-5

C. Zhang, P. Yin, Y. Rui, R. Cutler, P. Viola et al., Boosting-Based Multimodal Speaker Detection for Distributed Meeting Videos, IEEE Transactions on Multimedia, vol.10, issue.8, 2008.
DOI : 10.1109/TMM.2008.2007344

J. Blauert, Spatial Hearing The Psychophysics of Human Sound Localization, 1997.

L. Calmes, G. Lakemeyer, and H. Wagner, Azimuthal sound localization using coincidence of timing across frequency on a robotic platform, The Journal of the Acoustical Society of America, vol.121, issue.4, 2007.
DOI : 10.1121/1.2709866

V. M. Trifa, A. Koene, J. Moren, and G. Cheng, Real-time acoustic source localization in noisy environments for human-robot multimodal interaction, RO-MAN 2007, The 16th IEEE International Symposium on Robot and Human Interactive Communication, 2007.
DOI : 10.1109/ROMAN.2007.4415116

J. Hornstein, M. Lopes, J. Santos-victor, and F. Lacerda, Sound localization for humanoid robot -building audio-motor maps based on the HRTF Robust sound source localization using a microphone array on a mobile robot, Int. Conference on Intelligent Robots and Systems Int. Conference on Intelligent Robots and Systems, 2003.

J. Chen, J. Benesty, and Y. Huang, Robust time delay estimation exploiting redundancy among multiple microphones, Speech and Audio Processing, 2003.

K. N. Kutulakos and S. M. Seitz, A theory of shape by space carving, Proceedings of the Seventh IEEE International Conference on Computer Vision, 2000.
DOI : 10.1109/ICCV.1999.791235

C. Strecha, W. Von-hansen, L. V. Gool, P. Fua, and U. Thoennessen, On benchmarking camera calibration and multi-view stereo for high resolution imagery, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587706

M. S. Brandstein and H. F. Silverman, A practical methodology for speech source localization with microphone arrays, Computer Speech & Language, vol.11, issue.2, 1997.
DOI : 10.1006/csla.1996.0024

Y. Furukawa and J. Ponce, Accurate camera calibration from multiview stereo and bundle adjustment, International Journal of Computer Vision, vol.84, issue.3, 2009.
DOI : 10.1007/s11263-009-0232-2

J. Sanchez-riera, X. Alameda-pineda, J. Wienke, A. Deleforge, S. Arias et al., Online multimodal speaker detection for humanoid robots, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012), 2012.
DOI : 10.1109/HUMANOIDS.2012.6651509

URL : https://hal.archives-ouvertes.fr/hal-00768764

X. Alameda-pineda and R. P. Horaud, Geometrically-constrained robust time delay estimation using non-coplanar microphone arrays, European Signal Processing Conference, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00768763

P. Viola and M. Jones, Robust real-time face detection, 2004.
DOI : 10.1023/b:visi.0000013087.49260.fb

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.9805

X. Alameda-pineda, V. Khalidov, R. P. Horaud, and F. Forbes, Finding audio-visual events in informal social gatherings, Proceedings of the 13th international conference on multimodal interfaces, ICMI '11, pp.247-254, 2011.
DOI : 10.1145/2070481.2070527

URL : https://hal.archives-ouvertes.fr/inria-00623489

V. Khalidov, F. Forbes, and R. P. Horaud, Alignment of binocularbinaural data using a moving audio-visual target, IEEE Workshop on Multimedia Signal Processing (MMSP'13). Pula (Sardinia), 2013.
URL : https://hal.archives-ouvertes.fr/hal-00861482

A. Badali, J. Valin, F. Michaud, and P. Aarabi, Evaluating realtime audio localization algorithms for artificial audition in robotics, Proc. IROS, 2009.

C. H. Knapp and G. C. Carter, The generalized correlation method for estimation of time delay, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.24, issue.4, 1976.
DOI : 10.1109/TASSP.1976.1162830

M. Omologo and P. Svaizer, Use of the crosspower-spectrum phase in acoustic event location, IEEE Transactions on Speech and Audio Processing, vol.5, issue.3, 1997.
DOI : 10.1109/89.568735

E. Lehmann and A. Johansson, Prediction of energy decay in room impulse responses simulated with an image-source model, The Journal of the Acoustical Society of America, vol.124, issue.1, 2008.
DOI : 10.1121/1.2936367

M. Janvier, X. Alameda-pineda, L. Girin, and R. Horaud, Soundevent recognition with a companion humanoid, IEEE International Conference on Humanoid Robotics, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00768767

J. Wienke and S. Wrede, A middleware for collaborative research in experimental robotics Waldboost ? learning for time constrained sequential detection, 2011 IEEE/SICE Internatinal Symposium on System Integration Proceedings of the IEEE Computer Vision and Pattern Recognition, 2005.

M. U?i?á?, V. Franc, and V. Hlavá?, Facial Landmarks Detector Learned by the Structured Output SVM, VISAPP, 2012.
DOI : 10.1007/978-3-642-38241-3_26

V. Franc, S. Sonnenburg, and T. Werner, Cutting-Plane Methods in Machine Learning, pp.185-218, 2012.

S. Duffner and J. Odobez, A track creation and deletion framework for long-term online multi-face tracking, IEEE Transaction on Image Processing, 2013.

R. Gomez, K. Nakamura, T. Kawahara, and K. Nakadai, Multi-party human-robot interaction with distant-talking speech recognition, Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction, HRI '12, 2012.
DOI : 10.1145/2157689.2157835

C. T. Ishi, S. Matsuda, T. Kanda, T. Jitsuhiro, H. Ishiguro et al., A Robust Speech Recognition System for Communication Robots in Noisy Environments, IEEE Transactions on Robotics, vol.24, issue.3, 2008.
DOI : 10.1109/TRO.2008.919305

J. Huang, N. Ohnishi, X. Guo, and N. Sugie, Echo avoidance in a computational model of the precedence effect, Speech communication, vol.27, 1999.