. Aurora-3, , 2000.

J. M. Baker, L. Deng, J. Glass, S. Khudanpur, C. Lee et al., Research developments and directions in speech recognition and understanding, part 1, IEEE Signal Processing Magazine, vol.26, issue.3, pp.75-80, 2009.

J. Barker, R. Marxer, E. Vincent, and S. Watanabe, The third 'CHiME' speech separation and recognition challenge: Dataset, task and baselines, Proc. 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp.504-511, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01211376

J. Barker, E. Vincent, N. Ma, H. Christensen, and P. Green, The PASCAL CHiME speech separation and recognition challenge, Computer Speech and Language, vol.27, issue.3, pp.621-633, 2013.
URL : https://hal.archives-ouvertes.fr/inria-00584051

P. Bell, M. J. Gales, T. Hain, J. Kilgour, P. Lanchantin et al., The MGB challenge: Evaluating multi-genre broadcast media recognition, Proc. 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp.687-693, 2015.

N. Bertin, E. Camberlein, E. Vincent, R. Lebarbenchon, S. Peillon et al., A French corpus for distant-microphone speech processing in real homes, Proc. Interspeech, pp.2781-2785, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01343060

C. Blandin, A. Ozerov, and E. Vincent, Multi-source TDOA estimation in reverberant audio using angular spectra and clustering, Signal Processing, vol.92, issue.8, pp.1950-1960, 2012.
URL : https://hal.archives-ouvertes.fr/inria-00576297

A. Brutti, L. Cristoforetti, W. Kellermann, L. Marquardt, and M. Omologo, WOZ acoustic data collection for interactive TV, Proc. 6th Int. Conf. on Language Resources and Evaluation (LREC), pp.2330-2334, 2008.

A. Brutti, M. Ravanelli, P. Svaizer, and M. Omologo, A speech event detection and localization task for multiroom environments, Proc. 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA), pp.157-161, 2014.

I. Cohen and J. Benesty, 2010. Speech Processing in Modern Communication: Challenges and Perspectives
URL : https://hal.archives-ouvertes.fr/pasteur-00836177

L. Cristoforetti, M. Ravanelli, M. Omologo, A. Sosi, A. Abad et al., The DIRHA simulated corpus, Proc. 9th Int. Conf. on Language Resources and Evaluation (LREC), pp.2629-2634, 2014.

J. Dibiase, H. Silverman, and M. Brandstein, Robust localization in reverberant rooms, Microphone Arrays: Signal Processing Techniques and Applications, pp.157-180, 2001.

C. Fox, Y. Liu, E. Zwyssig, and T. Hain, The Sheffield wargames corpus, Proc. Interspeech, pp.1116-1120, 2013.

S. Galliano, E. Geoffrois, G. Gravier, J. Bonastre, D. Mostefa et al., Corpus description of the ESTER evaluation campaign for the rich transcription of French broadcast news, Proc. 5th Int. Conf. on Language Resources and Evaluation (LREC), pp.139-142, 2006.

G. Gravier, G. Adda, N. Paulsson, M. Carré, A. Giraudel et al., The ETAPE corpus for the evaluation of speech-based TV content processing in the French language, Proc. 8th Int. Conf. on Language Resources and Evaluation (LREC), pp.114-118, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00712591

J. H. Hansen, P. Angkititrakul, J. Plucienkowski, S. Gallant, and U. Yapanel, CU-Move": Analysis & corpus development for interactive in-vehicle speech systems, Proc. Eurospeech, pp.2023-2026, 2001.

A. Janin, D. Baron, J. Edwards, D. Ellis, D. Gelbart et al., The ICSI meeting corpus, Proc. 2003 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp.364-367, 2003.

C. Knapp and G. Carter, The generalized cross-correlation method for estimation of time delay, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.24, issue.4, pp.320-327, 1976.

L. Lamel, F. Schiel, A. Fourcin, J. Mariani, and H. Tillman, The translingual English database (TED), Proc. 3rd Int. Conf. on Spoken Language Processing (ICSLP), pp.1795-1798, 1994.

B. Lee, M. Hasegawa-johnson, C. Goudeseune, S. Kamdar, S. Borys et al., AVICAR: audio-visual speech corpus in a car environment, Proc. Interspeech, pp.2489-2492, 2004.

J. Li, L. Deng, R. Haeb-umbach, and Y. Gong, Robust Automatic Speech Recognition-A Bridge to Practical Applications, 2015.

M. Lincoln, I. Mccowan, J. Vepa, and H. K. Maganti, The multi-channel Wall Street Journal audio visual corpus (MCWSJ-AV): Specification and initial experiments, Proc. 2005 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp.357-362, 2005.

, Lincoln laboratory speech enhancement corpus, LLSEC, 1996.

D. Mostefa, N. Moreau, K. Choukri, G. Potamianos, S. Chu et al., The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms, Language Resources and Evaluation, vol.41, issue.3-4, pp.389-407, 2007.

A. Ozerov and E. Vincent, Using the FASST source separation toolbox for noise robust speech recognition, Proc. Int. Workshop on Machine Listening in Multisource Environments (CHiME), pp.86-87, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00598734

D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek et al., The Kaldi speech recognition toolkit, Proc. 2011 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2011.

M. Ravanelli, L. Cristoforetti, R. Gretter, M. Pellin, A. Sosi et al., The DIRHA-English corpus and related tasks for distant-speech recognition in domestic environments, Proc. 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp.275-282, 2015.

M. Ravanelli, P. Svaizer, and M. Omologo, Realistic multi-microphone data simulation for distant speech recognition, Proc. Interspeech, pp.2786-2790, 2016.

S. Renals, T. Hain, and H. Bourlard, Interpretation of multiparty meetings: The AMI and AMIDA projects, Proc. 2nd Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA), pp.115-118, 2008.

Y. Salaün, E. Vincent, N. Bertin, N. Souviraà-labastie, X. Jaureguiberry et al., The Flexible Audio Source Separation Toolbox Version 2.0, ICASSP Show & Tell, 2014.

A. Stupakov, E. Hanusa, D. Vijaywargi, D. Fox, and J. Bilmes, The design and collection of COSINE, a multi-microphone in situ speech corpus recorded in noisy environments, Computer Speech and Language, vol.26, issue.1, pp.52-66, 2011.

Y. Tachioka, S. Watanabe, J. Le-roux, and J. R. Hershey, Discriminative methods for noise robust speech recognition: A CHiME challenge benchmark, Proc. 2nd International Workshop on Machine Listening in Multisource Environments (CHiME), pp.19-24, 2013.

A. Tsiami, I. Rodomagoulakis, P. Giannoulis, A. Katsamanis, G. Potamianos et al., ATHENA: a Greek multi-sensory database for home automation control, pp.1608-1612, 2014.

M. Vacher, B. Lecouteux, P. Chahuara, F. Portet, B. Meillon et al., The Sweet-Home speech and multimodal corpus for home automation interaction, Proc. 9th Int. Conf. on Language Resources and Evaluation (LREC), pp.4499-4509, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00953006

E. Vincent, An experimental evaluation of Wiener filter smoothing techniques applied to under-determined audio source separation, Proc. 9th Int. Conf. on Latent Variable Analysis and Signal Separation (LVA/ICA), pp.157-164, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00544035

E. Vincent, J. Barker, S. Watanabe, J. Le-roux, F. Nesta et al., The second CHiME speech separation and recognition challenge: An overview of challenge systems and outcomes, Proc. 2013 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp.162-167, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00862750

E. Vincent, N. Bertin, and R. Badeau, Adaptive harmonic spectral decomposition for multiple pitch estimation, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.3, pp.528-537, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00350163

E. Vincent, R. Gribonval, and C. Févotte, Performance measurement in blind audio source separation, IEEE Transactions on Audio, Speech, and Language Processing, vol.14, issue.4, pp.1462-1469, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00544230

E. Vincent and T. Virtanen, Audio Source Separation and Speech Enhancement, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01881431

T. Virtanen and R. Singh, Techniques for Noise Robustness in Automatic Speech Recognition, 2012.

F. Weninger, H. Erdogan, S. Watanabe, E. Vincent, J. Le-roux et al., Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR, Proc. 12th Int. Conf. on Latent Variable Analysis and Signal Separation (LVA/ICA), pp.91-99, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01163493

M. Wölfel and J. Mcdonough, Distant Speech Recognition, 2009.

W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer et al., Achieving human parity in conversational speech recognition, 2016.

T. Yoshioka, N. Ito, M. Delcroix, A. Ogawa, K. Kinoshita et al., The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices, pp.436-443, 2015.

D. Yu, L. Deng, E. Zwyssig, M. Ravanelli, P. Svaizer et al., A multi-channel corpus for distant-speech interaction in presence of known interferences, Proc. 2015 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp.4480-4484, 2014.