C. Knapp and G. C. Carter, The generalized correlation method for estimation of time delay, IEEE Transactions on Acoustics, Speech and Signal Processing, vol.24, issue.4, pp.320-327, 1976.

J. Chen, J. Benesty, and Y. Huang, Time delay estimation in room acoustic environments: an overview, EURASIP Journal on applied signal processing, pp.170-170, 2006.

J. H. Dibiase, H. F. Silverman, and M. S. Brandstein, Robust localization in reverberant rooms, pp.157-180, 2001.

C. T. Ishi, O. Chatot, H. Ishiguro, and N. Hagita, Evaluation of a music-based real-time sound localization of multiple sound sources in real noisy environments, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.2027-2032, 2009.

O. Yilmaz and S. Rickard, Blind separation of speech mixtures via time-frequency masking, IEEE Transactions on Signal Processing, vol.52, issue.7, pp.1830-1847, 2004.
DOI : 10.1109/tsp.2004.828896

M. I. Mandel, R. J. Weiss, and D. P. Ellis, Model-based expectationmaximization source separation and localization, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.2, pp.382-394, 2010.

Y. Dorfan and S. Gannot, Tree-based recursive expectationmaximization algorithm for localization of acoustic sources, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.23, issue.10, pp.1692-1703, 2015.
DOI : 10.1109/taslp.2015.2444654

Y. Huang and J. Benesty, Adaptive multichannel time delay estimation based on blind system identification for acoustic source localization, Adaptive Signal Processing, pp.227-247, 2003.
DOI : 10.1007/978-3-662-11028-7_8

S. Doclo and M. Moonen, Robust adaptive time delay estimation for speaker localization in noisy and reverberant acoustic environments, EURASIP Journal on Applied Signal Processing, pp.1110-1124, 2003.
DOI : 10.1155/s111086570330602x

URL : https://asp-eurasipjournals.springeropen.com/track/pdf/10.1155/S111086570330602X

T. G. Dvorkind and S. Gannot, Time difference of arrival estimation of speech source in a noisy and reverberant environment, Signal Processing, vol.85, issue.1, pp.177-204, 2005.

K. Kowalczyk, E. A. Habets, W. Kellermann, and P. A. Naylor, Blind system identification using sparse learning for TDOA estimation of room reflections, IEEE Signal Processing Letters, vol.20, issue.7, pp.653-656, 2013.
DOI : 10.1109/lsp.2013.2261059

X. Li, L. Girin, R. Horaud, and S. Gannot, Multiple-speaker localization based on direct-path features and likelihood maximization with spatial sparsity regularization, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue.10, 1997.
DOI : 10.1109/taslp.2017.2740001

URL : https://hal.archives-ouvertes.fr/hal-01413417

Y. Avargel and I. Cohen, System identification in the short-time Fourier transform domain with crossband filtering, IEEE Transactions on Audio, Speech, and Language Processing, vol.15, issue.4, pp.1305-1319, 2007.
DOI : 10.1109/tasl.2006.889720

R. Talmon, I. Cohen, and S. Gannot, Relative transfer function identification using convolutive transfer function approximation, IEEE Transactions on Audio, Speech, and Language Processing, vol.17, issue.4, pp.546-555, 2009.
DOI : 10.1109/tasl.2008.2009576

D. Pavlidi, A. Griffin, M. Puigt, and A. Mouchtaris, Real-time multiple sound source localization and counting using a circular microphone array, IEEE Transactions on Audio, Speech, and Language Processing, vol.21, issue.10, pp.2193-2206, 2013.
DOI : 10.1109/tasl.2013.2272524

URL : https://hal.archives-ouvertes.fr/hal-01367320

O. Schwartz and S. Gannot, Speaker tracking using recursive EM algorithms, Speech, and Language Processing, vol.22, pp.392-402, 2014.
DOI : 10.1109/taslp.2013.2292361

N. Roman and D. Wang, Binaural tracking of multiple moving sources, IEEE Transactions on Audio, Speech, and Language Processing, vol.16, issue.4, pp.728-739, 2008.

C. Evers, A. H. Moore, P. A. Naylor, J. Sheaffer, and B. Rafaely, Bearing-only acoustic tracking of moving speakers for robot audition, IEEE International Conference on Digital Signal Processing (DSP), pp.1206-1210, 2015.
DOI : 10.1109/icdsp.2015.7252071

URL : http://www.commsp.ee.ic.ac.uk/%7Esap/uploads/publications/Evers2015.pdf

Y. Ban, L. Girin, X. Alameda-pineda, and R. Horaud, Exploiting the complementarity of audio and visual data in multi-speaker tracking, ICCV Workshop on Computer Vision for Audio-Visual Media, vol.3, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01577965

Z. Liang, X. Ma, and X. Dai, Robust tracking of moving sound source using multiple model kalman filter, Applied acoustics, vol.69, issue.12, pp.1350-1355, 2008.
DOI : 10.1016/j.apacoust.2007.11.010

J. Vermaak and A. Blake, Nonlinear filtering for speaker tracking in noisy and reverberant environments, Acoustics, Speech, and Signal Processing, vol.5, pp.3021-3024, 2001.
DOI : 10.1109/icassp.2001.940294

S. Ba, X. Alameda-pineda, A. Xompero, and R. Horaud, An online variational bayesian model for multi-person tracking from cluttered scenes, Computer Vision and Image Understanding, vol.153, pp.64-76, 2016.
DOI : 10.1016/j.cviu.2016.07.006

URL : https://hal.archives-ouvertes.fr/hal-01349763

I. Gebru, S. Ba, X. Li, and R. Horaud, Audio-visual speaker diarization based on spatiotemporal Bayesian fusion, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01413403

M. F. Fallon and S. J. Godsill, Acoustic source localization and tracking of a time-varying number of speakers, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.4, pp.1409-1415, 2012.

J. Valin, F. Michaud, and J. Rouat, Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering, Robotics and Autonomous Systems, vol.55, issue.3, pp.216-228, 2007.

V. Cevher, R. Velmurugan, and J. H. Mcclellan, Acoustic multitarget tracking using direction-of-arrival batches, IEEE Transactions on Signal Processing, vol.55, issue.6, pp.2810-2825, 2007.

B. Vo, S. Singh, and W. K. Ma, Tracking multiple speakers using random sets, Acoustics, Speech, and Signal Processing, vol.2, p.357, 2004.

W. Ma, B. Vo, S. S. Singh, and A. Baddeley, Tracking an unknown time-varying number of speakers using tdoa measurements: A random finite set approach, IEEE Transactions on Signal Processing, vol.54, issue.9, pp.3291-3304, 2006.

B. Vo and W. Ma, The gaussian mixture probability hypothesis density filter, IEEE Transactions on signal processing, vol.54, issue.11, pp.4091-4104, 2006.

X. Li, B. Mourgue, L. Girin, S. Gannot, and R. Horaud, Online localization of multiple moving speakers in reverberant environments, The Tenth IEEE Workshop on Sensor Array and Multichannel Signal Processing, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01795462

J. Kivinen and M. K. Warmuth, Exponentiated gradient versus gradient descent for linear predictors, Information and Computation, vol.132, issue.1, pp.1-63, 1997.

G. Xu, H. Liu, L. Tong, and T. Kailath, A least-squares approach to blind channel identification, IEEE Transactions on signal processing, vol.43, issue.12, pp.2982-2993, 1995.

X. Li, L. Girin, R. Horaud, and S. Gannot, Estimation of relative transfer function in the presence of stationary noise based on segmental power spectral density matrix subtraction, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.320-324, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01119186

A. L. Yuille and A. Rangarajan, The concave-convex procedure, Neural computation, vol.15, issue.4, pp.915-936, 2003.

I. D. Gebru, X. Alameda-pineda, F. Forbes, and R. Horaud, Em algorithms for weighted-data clustering with application to audio-visual scene analysis, IEEE transactions on pattern analysis and machine intelligence, vol.38, pp.2402-2415, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01261374

H. W. Löllmann, C. Evers, A. Schmidt, H. Mellmann, H. Barfuss et al., The LOCATA challenge data corpus for acoustic source localization and tracking, IEEE Sensor Array and Multichannel Signal Processing Workshop, 2018.

X. Li, R. Horaud, L. Girin, and S. Gannot, Voice activity detection based on statistical likelihood ratio with adaptive thresholding, IEEE International Workshop on Acoustic Signal Enhancement (IWAENC), pp.1-5, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01349776

X. Li, L. Girin, F. Badeig, and R. Horaud, Reverberant sound localization with a robot head based on direct-path relative transfer function, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.2819-2826, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01349771