D. Wang and J. Chen, Supervised speech separation based on deep learning: An overview, Speech, and Language Processing, vol.26, pp.1702-1726, 2018.

Y. Wang, K. Han, and D. Wang, Exploring monaural features for classification-based speech segregation, IEEE Transactions on Audio, Speech, and Language Processing, vol.21, issue.2, pp.270-279, 2013.

Y. Jiang, D. Wang, R. Liu, and Z. Feng, Binaural classification for reverberant speech segregation using deep neural networks, IEEE/ACM Transactions on Audio, Speech and Language Processing, vol.22, issue.12, pp.2112-2121, 2014.

F. Weninger, F. Eyben, and B. Schuller, Single-channel speech separation with memory-enhanced recurrent neural networks, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.3709-3713, 2014.

J. Chen and D. Wang, Long short-term memory for speaker generalization in supervised speech separation, The Journal of the Acoustical Society of America, vol.141, issue.6, pp.4705-4714, 2017.

J. Heymann, L. Drude, and R. Haeb-umbach, Neural network based spectral mask estimation for acoustic beamforming, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.196-200, 2016.

C. Boeddeker, H. Erdogan, T. Yoshioka, and R. Haeb-umbach, Exploring practical aspects of neural mask-based beamforming for far-field speech recognition, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.6697-6701, 2018.

X. Zhang and D. Wang, Deep learning based binaural speech separation in reverberant environments, IEEE/ACM transactions on audio, vol.25, issue.5, pp.1075-1084, 2017.

Z. Wang and D. Wang, Combining spectral and spatial features for deep learning based blind speaker separation, Speech, and Language Processing, vol.27, pp.457-468, 2019.

P. Pertilä and J. Nikunen, Distant speech separation using predicted time-frequency masks from spatial features, Speech communication, vol.68, pp.97-106, 2015.

S. Chakrabarty, D. Wang, and E. A. Habets, Time-frequency masking based online speech enhancement with multi-channel data using convolutional neural networks, International Workshop on Acoustic Signal Enhancement (IWAENC), pp.476-480, 2018.

X. Li, L. Girin, S. Gannot, and R. Horaud, Non-stationary noise power spectral density estimation based on regional statistics, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.181-185, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01250892

T. Gerkmann and R. C. Hendriks, Unbiased MMSE-based noise power estimation with low complexity and low tracking delay, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.4, pp.1383-1393, 2012.

S. Gannot, D. Burshtein, and E. Weinstein, Signal enhancement using beamforming and nonstationarity with applications to speech, IEEE Transactions on, vol.49, issue.8, pp.1614-1626, 2001.

X. Li, L. Girin, R. Horaud, and S. Gannot, Estimation of relative transfer function in the presence of stationary noise based on segmental power spectral density matrix subtraction, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.320-324, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01119186

X. Li, S. Leglaive, L. Girin, and R. Horaud, Audio-noise power spectral density estimation using long short-term memory, IEEE Signal Processing Letters, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02100059

A. Schwarz and W. Kellermann, Coherent-to-diffuse power ratio estimation for dereverberation, Speech, and Language Processing, vol.23, pp.1006-1018, 2015.

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural computation, vol.9, issue.8, pp.1735-1780, 1997.

J. Barker, R. Marxer, E. Vincent, and S. Watanabe, The third 'CHiME' speech separation and recognition challenge: Dataset, task and baselines, IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp.504-511, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01211376

F. Chollet, Keras, 2015.

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, International Conference on Learning Representations, 2015.

A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.2, pp.749-752, 2001.

C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, An algorithm for intelligibility prediction of time-frequency weighted noisy speech, IEEE Transactions on Audio, Speech, and Language Processing, vol.19, issue.7, pp.2125-2136, 2011.

X. Anguera, C. Wooters, and J. Hernando, Acoustic beamforming for speaker diarization of meetings, IEEE Transactions on Audio, Speech, and Language Processing, vol.15, issue.7, pp.2011-2022, 2007.