S. Gannot, E. Vincent, S. Markovich-golan, and A. Ozerov, A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue.4, pp.692-730, 2017.
DOI : 10.1109/TASLP.2016.2647702
URL : https://hal.archives-ouvertes.fr/hal-01414179

S. Ba, X. Alameda-pineda, A. Xompero, and R. Horaud, An on-line variational Bayesian model for multi-person tracking from cluttered scenes, Computer Vision and Image Understanding, vol.153, pp.64-76, 2016.
DOI : 10.1016/j.cviu.2016.07.006
URL : https://hal.archives-ouvertes.fr/hal-01349763

Y. Ban, S. Ba, X. Alameda-pineda, and R. Horaud, Tracking Multiple Persons Based on a Variational Bayesian Model, ECCV Workshops, pp.52-67, 2016.
DOI : 10.1007/978-3-540-69568-4_1
URL : https://hal.archives-ouvertes.fr/hal-01359559

X. Alameda-pineda and R. Horaud, Vision-guided robot hearing, The International Journal of Robotics Research, vol.5, issue.1, pp.437-456, 2015.
DOI : 10.1080/01691864.2012.687152
URL : https://hal.archives-ouvertes.fr/hal-00990766

I. D. Gebru, X. Alameda-pineda, F. Forbes, and R. Horaud, EM Algorithms for Weighted-Data Clustering with Application to Audio-Visual Scene Analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.38, issue.12, pp.2402-2415, 2016.
DOI : 10.1109/TPAMI.2016.2522425
URL : https://hal.archives-ouvertes.fr/hal-01261374

M. Barnard, W. Wang, A. Hilton, and J. Kittler, Mean-shift and sparse sampling-based SMC-PHD filtering for audio informed visual speaker tracking, IEEE Transactions on Multimedia, vol.18, issue.12, pp.2417-2431, 2016.

Y. Liu, W. Wang, J. Chambers, V. Kilic, and A. Hilton, Particle flow SMC-PHD filter for audiovisual multi-speaker tracking, International Conference on Latent Variable Analysis and Signal Separation, pp.344-353, 2017.
DOI : 10.1007/978-3-319-53547-0_33

V. K?l?ç, M. Barnard, W. Wang, and J. Kittler, Audio Assisted Robust Visual Tracking With Adaptive Particle Filtering, IEEE Transactions on Multimedia, vol.17, issue.2, pp.186-200, 2015.
DOI : 10.1109/TMM.2014.2377515

D. Gatica-perez, G. Lathoud, J. Odobez, and I. Mccowan, Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings, IEEE Transactions on Audio, Speech and Language Processing, vol.15, issue.2, pp.601-616, 2007.
DOI : 10.1109/TASL.2006.881678

X. Qian, A. Brutti, M. Omologo, and A. Cavallaro, 3D audio-visual speaker tracking with an adaptive particle filter, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.2017-2896
DOI : 10.1109/ICASSP.2017.7952686

S. Mohsen-naqvi, S. Wang, . Khan, J. Barnard, and . Chambers, Multimodal (audio???visual) source separation exploiting multi-speaker tracking, robust beamforming and time???frequency masking, IET Signal Processing, vol.6, issue.5, pp.466-477, 2012.
DOI : 10.1049/iet-spr.2011.0124

N. Schult, T. Reineking, T. Kluss, and C. Zetzsche, Information-Driven Active Audio-Visual Source Localization, PLOS ONE, vol.106, issue.3, 2015.
DOI : 10.1371/journal.pone.0137057.g011
URL : https://doi.org/10.1371/journal.pone.0137057

Y. Ban, L. Girin, X. Alameda-pineda, and R. Horaud, Exploiting the Complementarity of Audio and Visual Data in Multi-speaker Tracking, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), 2017.
DOI : 10.1109/ICCVW.2017.60
URL : https://hal.archives-ouvertes.fr/hal-01577965

X. Li, L. Girin, R. Horaud, and S. Gannot, Estimation of the Direct-Path Relative Transfer Function for Supervised Sound-Source Localization, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.24, issue.11, pp.2171-2186, 2016.
DOI : 10.1109/TASLP.2016.2598319
URL : https://hal.archives-ouvertes.fr/hal-01349691

X. Li, L. Girin, R. Horaud, and S. Gannot, Multiple-Speaker Localization Based on Direct-Path Features and Likelihood Maximization With Spatial Sparsity Regularization, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue.10, 1997.
DOI : 10.1109/TASLP.2017.2740001
URL : https://hal.archives-ouvertes.fr/hal-01413417

Y. Avargel and I. Cohen, System Identification in the Short-Time Fourier Transform Domain With Crossband Filtering, IEEE Transactions on Audio, Speech and Language Processing, vol.15, issue.4, pp.1305-1319, 2007.
DOI : 10.1109/TASL.2006.889720

R. Talmon, I. Cohen, and S. Gannot, Relative Transfer Function Identification Using Convolutive Transfer Function Approximation, IEEE Transactions on Audio, Speech, and Language Processing, vol.17, issue.4, pp.546-555, 2009.
DOI : 10.1109/TASL.2008.2009576

X. Li, L. Girin, R. Horaud, and S. Gannot, Estimation of relative transfer function in the presence of stationary noise based on segmental power spectral density matrix subtraction, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.320-324, 2015.
DOI : 10.1109/ICASSP.2015.7177983
URL : https://hal.archives-ouvertes.fr/hal-01119186

I. D. Gebru, X. Siì-eye-ba, R. Li, and . Horaud, Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, 2017.
DOI : 10.1109/TPAMI.2017.2648793
URL : https://hal.archives-ouvertes.fr/hal-01413403

A. Milan, L. Leal-taixe, I. Reid, S. Roth, and K. Schindler, Mot16: A benchmark for multi-object tracking, 2016.