O. Yilmaz and S. Rickard, Blind separation of speech mixtures via timefrequency masking, IEEE Transactions on Signal Processing, vol.52, issue.7, pp.1830-1847, 2004.

M. I. Mandel, R. J. Weiss, and D. P. Ellis, Model-based expectationmaximization source separation and localization, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.2, pp.382-394, 2010.

Y. Dorfan and S. Gannot, Tree-based recursive expectation-maximization algorithm for localization of acoustic sources, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.23, issue.10, pp.1692-1703, 2015.

O. Schwartz and S. Gannot, Speaker tracking using recursive EM algorithms, Speech, and Language Processing, vol.22, pp.392-402, 2014.
DOI : 10.1109/taslp.2013.2292361

J. Vermaak and A. Blake, Nonlinear filtering for speaker tracking in noisy and reverberant environments, Acoustics, Speech, and Signal Processing, vol.5, pp.3021-3024, 2001.

J. Valin, F. Michaud, and J. Rouat, Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering, Robotics and Autonomous Systems, vol.55, issue.3, pp.216-228, 2007.

C. Evers, A. H. Moore, P. A. Naylor, J. Sheaffer, and B. Rafaely, Bearing-only acoustic tracking of moving speakers for robot audition, IEEE International Conference on Digital Signal Processing, pp.1206-1210, 2015.

S. Ba, X. Alameda-pineda, A. Xompero, and R. Horaud, An on-line variational bayesian model for multi-person tracking from cluttered scenes, Computer Vision and Image Understanding, vol.153, pp.64-76, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01349763

I. Gebru, S. Ba, X. Li, and R. Horaud, Audio-visual speaker diarization based on spatiotemporal Bayesian fusion, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01413403

Y. Ban, L. Girin, X. Alameda-pineda, and R. Horaud, Exploiting the complementarity of audio and visual data in multi-speaker tracking, ICCV Workshop on Computer Vision for Audio-Visual Media, vol.3, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01577965

Y. Avargel and I. Cohen, System identification in the short-time Fourier transform domain with crossband filtering, IEEE Transactions on Audio, Speech, and Language Processing, vol.15, issue.4, pp.1305-1319, 2007.

R. Talmon, I. Cohen, and S. Gannot, Relative transfer function identification using convolutive transfer function approximation, IEEE Transactions on Audio, Speech, and Language Processing, vol.17, issue.4, pp.546-555, 2009.
DOI : 10.1109/tasl.2008.2009576

X. Li, L. Girin, R. Horaud, and S. Gannot, Estimation of the direct-path relative transfer function for supervised sound-source localization, IEEE/ACM Transactions on Audio, Speech and Language Processing, vol.24, issue.11, pp.2171-2186, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01349691

, Multiple-speaker localization based on direct-path features and likelihood maximization with spatial sparsity regularization, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue.10, 1997.

X. Li, B. Mourgue, L. Girin, S. Gannot, and R. Horaud, Online localization of multiple moving speakers in reverberant environments, The Tenth IEEE Workshop on Sensor Array and Multichannel Signal Processing, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01795462

X. Li, Y. Ban, L. Girin, X. Alameda-pineda, and R. Horaud, Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environment, submitted to Journal on Selected Topics in Signal Processing, 2018.

J. Kivinen and M. K. Warmuth, Exponentiated gradient versus gradient descent for linear predictors, Information and Computation, vol.132, issue.1, pp.1-63, 1997.
DOI : 10.1006/inco.1996.2612

URL : https://doi.org/10.1006/inco.1996.2612

Y. Ban, X. Alameda-pineda, F. Badeig, S. Ba, and R. Horaud, Tracking a varying number of people with a visually-controlled robotic head, IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.4144-4151, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01542987

H. W. Löllmann, C. Evers, A. Schmidt, H. Mellmann, H. Barfuss et al., The LOCATA challenge data corpus for acoustic source localization and tracking, IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM), 2018.

G. Xu, H. Liu, L. Tong, and T. Kailath, A least-squares approach to blind channel identification, IEEE Transactions on signal processing, vol.43, issue.12, pp.2982-2993, 1995.

X. Li, L. Girin, R. Horaud, and S. Gannot, Estimation of relative transfer function in the presence of stationary noise based on segmental power spectral density matrix subtraction, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.320-324, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01119186

I. D. Gebru, X. Alameda-pineda, F. Forbes, and R. Horaud, Em algorithms for weighted-data clustering with application to audio-visual scene analysis, IEEE transactions on pattern analysis and machine intelligence, vol.38, pp.2402-2415, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01261374