G. J. Brown and M. Cooke, Computational auditory scene analysis, Computer Speech & Language, vol.8, pp.297-336, 1994.

D. Wang and G. J. Brown, Computational Auditory Scene Analysis: Principles, Algorithms, and Applications, 2006.

Z. John-r-hershey, J. L. Chen, S. Roux, and . Watanabe, Deep clustering: Discriminative embeddings for segmentation and separation, pp.31-35, 2016.

Z. Chen, Y. Luo, and N. Mesgarani, Deep attractor network for single-microphone speaker separation, pp.246-250, 2017.

K. Kinoshita, L. Drude, M. Delcroix, and T. Nakatani, 2 Speech separation model and AM trained using true DOA values were used since the corresponding models trained using estimated DOAs performed poorly, ICASSP, pp.5064-5068, 2018.

M. Kolbaek, D. Yu, Z. Tan, and J. Jensen, Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue.10, pp.1901-1913, 2017.

, Microphone Arrays: Signal Processing Techniques and Applications, Digital Signal Processing, 2001.

S. Gannot, E. Vincent, S. Markovich-golan, and A. Ozerov, A consolidated perspective on multimicrophone speech enhancement and source separation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue.4, pp.692-730, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01414179

Z. Wang and D. Wang, Combining spectral and spatial features for deep learning based blind speaker separation, Speech, and Language Processing, vol.27, pp.457-468, 2019.

L. Perotin, R. Serizel, E. Vincent, and A. Guérin, Multichannel speech separation with recurrent neural networks from high-order ambisonics recordings, ICASSP, pp.36-40, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01699759

Z. Chen, X. Xiao, T. Yoshioka, H. Erdogan, J. Li et al., Multi-Channel Overlapped Speech Recognition with Location Guided Speech Extraction Network, IEEE Spoken Language Technology Workshop (SLT), pp.558-565, 2018.

M. Taseska, A. P. Emanuël, and . Habets, DOA-informed source extraction in the presence of competing talkers and background noise, EURASIP Journal on Advances in Signal Processing, vol.2017, issue.1, p.60, 2017.

Z. Chen, J. Li, X. Xiao, T. Yoshioka, H. Wang et al., Cracking the cocktail party problem by multi-beam deep attractor network, ASRU, pp.437-444, 2017.

H. Barfuss and W. Kellermann, On the impact of localization errors on HRTF-based robust least-squares beamforming, pp.1072-1075, 2016.

J. Barker, S. Watanabe, E. Vincent, and J. Trmal, The fifth 'CHiME' speech separation and recognition challenge: Dataset, task and baselines, pp.1561-1565, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01744021

C. Knapp and G. Carter, The generalized correlation method for estimation of time delay, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.24, issue.4, pp.320-327, 1976.

S. Sivasankaran, E. Vincent, and D. Fohr, Keyword-based speaker localization: Localizing a target speaker in a multi-speaker environment, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01817519

M. Wölfel and J. Mcdonough, Distant Speech Recognition, 2009.

E. Warsitz and R. Haeb-umbach, Blind acoustic beamforming based on generalized eigenvalue decomposition, IEEE Transactions on Audio, Speech, and Language Processing, vol.15, issue.5, pp.1529-1539, 2007.

A. Spriet, M. Moonen, and J. Wouters, Spatially pre-processed speech distortion weighted multi-channel Wiener filtering for noise reduction, Signal Processing, vol.84, issue.12, pp.2367-2387, 2004.

Z. Wang, E. Vincent, R. Serizel, and Y. Yan, Rank-1 constrained multichannel Wiener filter for speech recognition in noisy environments, Computer Speech & Language, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01634449

A. P. Emanuël and . Habets, RIR-Generator: Room impulse response generator, 2018.

N. Ryant, K. Church, C. Cieri, A. Cristia, J. Du et al., The second DIHARD diarization challenge: Dataset, task, and baselines, 2019.

Z. Wang, J. L. Roux, and J. R. Hershey, Multi-Channel deep clustering: Discriminative spectral and spatial embeddings for speaker-independent speech separation, ICASSP, pp.1-5, 2018.

D. Povey, V. Peddinti, D. Galvez, P. Ghahremani, V. Manohar et al., Purely sequence-trained neural networks for ASR based on lattice-free MMI, pp.2751-2755, 2016.