D. Wang and J. Chen, Supervised speech separation based on deep learning: An overview, Speech, and Language Processing, vol.26, pp.1702-1726, 2018.

Y. Wang, K. Han, and D. Wang, Exploring monaural features for classification-based speech segregation, IEEE Transactions on Audio, Speech, and Language Processing, vol.21, issue.2, pp.270-279, 2013.

Y. Jiang, D. Wang, R. Liu, and Z. Feng, Binaural classification for reverberant speech segregation using deep neural networks, IEEE/ACM Transactions on Audio, Speech and Language Processing, vol.22, issue.12, pp.2112-2121, 2014.

K. Tan, J. Chen, and D. Wang, Gated residual networks with dilated convolutions for monaural speech enhancement, IEEE Transactions on Audio, Speech, and Language Processing, vol.27, issue.1, pp.189-198, 2019.

F. Weninger, F. Eyben, and B. Schuller, Single-channel speech separation with memory-enhanced recurrent neural networks, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.3709-3713, 2014.

J. Chen and D. Wang, Long short-term memory for speaker generalization in supervised speech separation, The Journal of the Acoustical Society of America, vol.141, issue.6, pp.4705-4714, 2017.

J. Heymann, L. Drude, and R. Haeb-umbach, Neural network based spectral mask estimation for acoustic beamforming, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.196-200, 2016.

C. Boeddeker, H. Erdogan, T. Yoshioka, and R. Haeb-umbach, Exploring practical aspects of neural mask-based beamforming for far-field speech recognition, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.6697-6701, 2018.

X. Zhang and D. Wang, Deep learning based binaural speech separation in reverberant environments, IEEE/ACM transactions on audio, vol.25, issue.5, pp.1075-1084, 2017.

Z. Wang and D. Wang, Combining spectral and spatial features for deep learning based blind speaker separation, Speech, and Language Processing, vol.27, pp.457-468, 2019.

T. Yoshioka, Z. Chen, C. Liu, X. Xiao, H. Erdogan et al., Low-latency speaker-independent continuous speech separation, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.6980-6984, 2019.

P. Pertilä and J. Nikunen, Distant speech separation using predicted time-frequency masks from spatial features, Speech communication, vol.68, pp.97-106, 2015.

S. Chakrabarty and E. A. Habets, Time-frequency masking based online multi-channel speech enhancement with convolutional recurrent neural networks, IEEE Journal of Selected Topics in Signal Processing, vol.13, issue.4, pp.787-799, 2019.

X. Xiao, S. Watanabe, H. Erdogan, L. Lu, J. Hershey et al., Deep beamforming networks for multi-channel speech recognition, IEEE International Conference on Acoustics, Speech and Signal Processing

, IEEE, pp.5745-5749, 2016.

T. N. Sainath, R. J. Weiss, K. W. Wilson, B. Li, A. Narayanan et al., Multichannel signal processing with deep neural networks for automatic speech recognition, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue.5, pp.965-979, 2017.

X. Li, L. Girin, S. Gannot, and R. Horaud, Non-stationary noise power spectral density estimation based on regional statistics, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.181-185, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01250892

T. Gerkmann and R. C. Hendriks, Unbiased MMSE-based noise power estimation with low complexity and low tracking delay, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.4, pp.1383-1393, 2012.

S. Gannot, D. Burshtein, and E. Weinstein, Signal enhancement using beamforming and nonstationarity with applications to speech, IEEE Transactions on, vol.49, issue.8, pp.1614-1626, 2001.

X. Li, L. Girin, R. Horaud, and S. Gannot, Estimation of relative transfer function in the presence of stationary noise based on segmental power spectral density matrix subtraction, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.320-324, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01119186

X. Li, S. Leglaive, L. Girin, and R. Horaud, Audio-noise power spectral density estimation using long short-term memory, IEEE Signal Processing Letters, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02100059

A. Schwarz and W. Kellermann, Coherent-to-diffuse power ratio estimation for dereverberation, Speech, and Language Processing, vol.23, pp.1006-1018, 2015.

Y. Ephraim and D. Malah, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator, IEEE Transactions on Acoustics, Speech and Signal Processing, vol.32, issue.6, pp.1109-1121, 1984.

I. Cohen and B. Berdugo, Speech enhancement for non-stationary noise environments, Signal processing, vol.81, issue.11, pp.2403-2418, 2001.

M. Brandstein and D. Ward, Microphone arrays: signal processing techniques and applications, 2013.

X. Li and R. Horaud, Multichannel speech enhancement based on timefrequency masking using subband long short-term memory, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02264247

Y. Wang, A. Narayanan, and D. Wang, On training targets for supervised speech separation, Speech, and Language Processing, vol.22, pp.1849-1858, 2014.

D. S. Williamson, Y. Wang, and D. Wang, Complex ratio masking for monaural speech separation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.24, issue.3, pp.483-492, 2016.

J. Barker, R. Marxer, E. Vincent, and S. Watanabe, The third 'CHiME' speech separation and recognition challenge: Dataset, task and baselines, IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp.504-511, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01211376

F. Chollet, Keras, 2015.

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, International Conference on Learning Representations, 2015.

A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.2, pp.749-752, 2001.

C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, An algorithm for intelligibility prediction of time-frequency weighted noisy speech, IEEE Transactions on Audio, Speech, and Language Processing, vol.19, issue.7, pp.2125-2136, 2011.

J. F. Santos, M. Senoussaoui, and T. H. Falk, An improved non-intrusive intelligibility metric for noisy and reverberant speech, International Workshop on Acoustic Signal Enhancement (IWAENC), pp.55-59, 2014.

T. Hori, Z. Chen, J. R. Hershey, J. L. Roux, V. Mitra et al., The MERL/SRI system for the 3RD CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition, IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp.475-481, 2015.

X. Anguera, C. Wooters, and J. Hernando, Acoustic beamforming for speaker diarization of meetings, IEEE Transactions on Audio, Speech, and Language Processing, vol.15, issue.7, pp.2011-2022, 2007.