J. R. Hershey, Z. Chen, J. L. Roux, and S. Watanabe, Deep clustering: Discriminative embeddings for segmentation and separation, pp.31-35, 2016.

K. Kinoshita, L. Drude, M. Delcroix, and T. Nakatani, Listening to each speaker one by one with recurrent selective hearing networks, ICASSP, pp.5064-5068, 2018.

M. Kolbaek, D. Yu, Z. Tan, and J. Jensen, Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue.10, pp.1901-1913, 2017.

Y. Luo and N. Mesgarani, Conv-TasNet: Surpassing ideal time-frequency magnitude masking for speech separation, Speech, and Language Processing, vol.27, pp.1256-1266, 2019.

D. Wang and G. J. Brown, Computational Auditory Scene Analysis: Principles, Algorithms, and Applications, 2006.

Z. Wang and D. Wang, Combining spectral and spatial features for deep learning based blind speaker separation, Speech, and Language Processing, vol.27, pp.457-468, 2019.

T. Higuchi, K. Kinoshita, M. Delcroix, K. Zmolíková, and T. Nakatani, Deep clusteringbased beamforming for separation with unknown number of sources, pp.1183-1187, 2017.

L. Perotin, R. Serizel, E. Vincent, and A. Guérin, Multichannel speech separation with recurrent neural networks from high-order ambisonics recordings, ICASSP, pp.36-40, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01699759

Z. Chen, X. Xiao, T. Yoshioka, H. Erdogan, J. Li et al., Multi-Channel overlapped speech recognition with location guided speech extraction network, IEEE Spoken Language Technology Workshop (SLT), pp.558-565, 2018.

S. Sivasankaran, E. Vincent, and D. Fohr, Analyzing the impact of speaker localization errors on speech separation for automatic speech recognition, EUSIPCO (Submitted), 2020.
URL : https://hal.archives-ouvertes.fr/hal-02355669

N. Delfosse and P. Loubaton, Adaptive blind separation of independent sources: A deflation approach, Signal Processing, vol.45, issue.1, pp.59-83, 1995.

N. Takahashi, S. Parthasaarathy, N. Goswami, and Y. Mitsufuji, Recursive speech separation for unknown number of speakers

S. Gannot, E. Vincent, S. Markovich-golan, and A. Ozerov, A consolidated perspective on multimicrophone speech enhancement and source separation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue.4, pp.692-730, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01414179

S. Sivasankaran, E. Vincent, and D. Fohr, Keyword-based speaker localization: Localizing a target speaker in a multi-speaker environment, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01817519

A. Sharath-adavanne, T. Politis, and . Virtanen, Direction of arrival estimation for multiple sound sources using convolutional recurrent neural network," in EUSIPCO, pp.1462-1466, 2018.

Z. Wang, E. Vincent, R. Serizel, and Y. Yan, Rank-1 constrained multichannel Wiener filter for speech recognition in noisy environments, Computer Speech & Language, vol.49, pp.37-51, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01634449

J. Barker, S. Watanabe, E. Vincent, and J. Trmal, The fifth 'CHiME' speech separation and recognition challenge: Dataset, task and baselines, pp.1561-1565, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01744021

N. Ryant, K. Church, C. Cieri, A. Cristia, J. Du et al., The second DIHARD diarization challenge: Dataset, task, and baselines, Interspeech, 2019.

A. P. Emanuël and . Habets, RIR-Generator: Room impulse response generator, 2018.

T. Menne, I. Sklyar, R. Schlüter, and H. Ney, Analysis of deep clustering as preprocessing for automatic speech recognition of sparsely overlapping speech, ICASSP, 2019.

C. Knapp and G. Carter, The generalized correlation method for estimation of time delay, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.24, issue.4, pp.320-327, 1976.

D. Povey, V. Peddinti, D. Galvez, P. Ghahremani, V. Manohar et al., Purely sequence-trained neural networks for ASR based on lattice-free MMI, pp.2751-2755, 2016.