Y. Ephraim and D. Malah, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator, IEEE Transactions on Acoustics, Speech and Signal Processing, vol.32, issue.6, pp.1109-1121, 1984.

I. Cohen and B. Berdugo, Speech enhancement for nonstationary noise environments, Signal processing, vol.81, issue.11, pp.2403-2418, 2001.

D. Wang and J. Chen, Supervised speech separation based on deep learning: An overview, Speech, and Language Processing, vol.26, pp.1702-1726, 2018.

Y. Wang, A. Narayanan, and D. Wang, On training targets for supervised speech separation, Speech, and Language Processing, vol.22, pp.1849-1858, 2014.

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural computation, vol.9, issue.8, pp.1735-1780, 1997.

K. Cho, B. Van-merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares et al., Learning phrase representations using RNN encoder-decoder for statistical machine translation, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01433235

F. Weninger, F. Eyben, and B. Schuller, Single-channel speech separation with memory-enhanced recurrent neural networks, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.3709-3713, 2014.

J. Chen and D. Wang, Long short-term memory for speaker generalization in supervised speech separation, The Journal of the Acoustical Society of America, vol.141, issue.6, pp.4705-4714, 2017.

L. Sun, J. Du, L. Dai, and C. Lee, Multiple-target deep learning for lstm-rnn based speech enhancement, 2017 Handsfree Speech Communications and Microphone Arrays, pp.136-140, 2017.

Y. Xia, S. Braun, C. K. Reddy, H. Dubey, R. Cutler et al., Weighted speech distortion losses for neural-networkbased real-time speech enhancement, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.871-875, 2020.

K. Tan and D. Wang, A convolutional recurrent neural network for real-time speech enhancement, Interspeech, vol.2018, pp.3229-3233, 2018.

A. Li, C. Zheng, and X. Li, Convolutional recurrent neural network based progressive learning for monaural speech enhancement, 2019.

T. Gerkmann and R. C. Hendriks, Unbiased MMSE-based noise power estimation with low complexity and low tracking delay, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.4, pp.1383-1393, 2012.

X. Li, L. Girin, S. Gannot, and R. Horaud, Non-stationary noise power spectral density estimation based on regional statistics, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.181-185, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01250892

X. Li, S. Leglaive, L. Girin, and R. Horaud, Audio-noise power spectral density estimation using long short-term memory, IEEE Signal Processing Letters, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02100059

Y. Wang, K. Han, and D. Wang, Exploring monaural features for classification-based speech segregation, IEEE Transactions on Audio, Speech, and Language Processing, vol.21, issue.2, pp.270-279, 2013.

J. S. Turek, S. Jain, M. Capota, A. G. Huth, and T. L. Willke, A single-layer RNN can approximate stacked and bidirectional RNNs, and topologies in between, 2019.

X. Li and R. Horaud, Multichannel speech enhancement based on time-frequency masking using subband long short-term memory, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp.298-302, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02264247

D. S. Williamson, Y. Wang, and D. Wang, Complex ratio masking for monaural speech separation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.24, issue.3, pp.483-492, 2016.

C. K. Reddy, E. Beyrami, H. Dubey, V. Gopal, R. Cheng et al., The Interspeech 2020 deep noise suppression challenge: Datasets, subjective speech quality and testing framework, 2020.

E. Hadad, F. Heese, P. Vary, and S. Gannot, Multichannel audio database in various acoustic environments, IEEE International Workshop on Acoustic Signal Enhancement, pp.313-317, 2014.

K. Kinoshita, M. Delcroix, S. Gannot, E. A. Habets, R. Haeb-umbach et al., A summary of the reverb challenge: state-of-theart and remaining challenges in reverberant speech processing research, EURASIP Journal on Advances in Signal Processing, vol.2016, issue.1, p.7, 2016.

F. Chollet, Keras, 2015.

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, International Conference on Learning Representations, 2015.

A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.2, pp.749-752, 2001.

C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, An algorithm for intelligibility prediction of time-frequency weighted noisy speech, IEEE Transactions on Audio, Speech, and Language Processing, vol.19, issue.7, pp.2125-2136, 2011.

E. Vincent, R. Gribonval, and C. Févotte, Performance measurement in blind audio source separation, Speech, and Language Processing, vol.14, pp.1462-1469, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00544230