E. Vincent, T. Virtanen, and S. Gannot, Audio Source Separation and Speech Enhancement, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01881431

Y. Salaün, E. Vincent, N. Bertin, N. Souviraà-labastie, X. Jaureguiberry et al., The Flexible Audio Source Separation Toolbox Version 2.0, ICASSP Show & Tell, 2014.

K. Nakadai, H. G. Okuno, H. Nakajima, Y. Hasegawa, and H. Tsujino, An open source software system for robot audition HARK and its evaluation, Humanoids, pp.561-566, 2008.

F. Grondin, D. Létourneau, F. Ferland, V. Rousseau, and F. Michaud, The ManyEars open framework, Autonomous Robots, vol.34, pp.217-232, 2013.

B. Schuller, A. Lehmann, F. Weninger, F. Eyben, and G. Rigoll, Blind enhancement of the rhythmic and harmonic sections by nmf: Does it help, pp.361-364, 2009.

J. R. Hershey, Z. Chen, J. L. Roux, and S. Watanabe, Deep clustering: discriminative embeddings for segmentation and separation, ICASSP, pp.31-35, 2016.

D. Yu, M. Kolbaek, Z. Tan, and J. Jensen, Permutation invariant training of deep models for speaker-independent multi-talker speech separation, pp.241-245, 2017.

Y. Luo and N. Mesgarani, TasNet: Time-domain audio separation network for real-time, single-channel speech separation, in ICASSP, pp.696-700, 2018.

-. Conv-tasnet, Surpassing ideal time-frequency magnitude masking for speech separation, IEEE/ACM Trans. Audio, Speech, Lang. Process, vol.27, issue.8, pp.1256-1266, 2019.

N. Zeghidour and D. Grangier, Wavesplit: End-to-end speech separation by speaker clustering, 2020.

E. Manilow, P. Seetharaman, and B. Pardo, The Northwestern University Source Separation Library, in ISMIR, pp.297-305, 2018.

Z. Ni and M. I. Mandel, Onssen: an open-source speech separation and enhancement library, 2019.

F. Stöter, S. Uhlich, A. Liutkus, and Y. Mitsufuji, Open-Unmix -a reference implementation for music source separation, J. Open Source Soft, vol.4, issue.41, p.1667, 2019.

A. Paszke, S. Gross, F. Massa, A. Lerer, and J. Bradbury, Py-Torch: An imperative style, high-performance deep learning library, 2019.

E. Tzinis, S. Venkataramani, Z. Wang, C. Subakan, and P. Smaragdis, Two-step sound source separation: Training on learned latent targets, ICASSP, pp.31-35, 2020.

Y. Luo, Z. Chen, and T. Yoshioka, Dual-path RNN: Efficient long sequence modeling for time-domain single-channel speech separation, ICASSP, pp.46-50, 2020.

Y. Isik, J. L. Roux, Z. Chen, S. Watanabe, and J. R. Hershey, Single-channel multi-speaker separation using deep clustering, pp.545-549, 2016.

Z. Chen, Y. Luo, and N. Mesgarani, Deep attractor network for single-microphone speaker separation, pp.246-250, 2017.

J. Heitkaemper, D. Jakobeit, C. Boeddeker, L. Drude, and R. Haeb-umbach, Demystifying TasNet: A dissecting approach, ICASSP, pp.6359-6363, 2020.

F. Bahmaninezhad, J. Wu, R. Gu, S. Zhang, Y. Xu et al., A comprehensive study of speech separation: Spectrogram vs waveform separation, pp.4574-4578, 2019.

I. Kavalerov, S. Wisdom, H. Erdogan, B. Patton, K. Wilson et al., Universal sound separation, WASPAA, pp.175-179, 2019.

M. Pariente, S. Cornell, A. Deleforge, and E. Vincent, Filterbank design for end-to-end speech separation, ICASSP, pp.6364-6368, 2020.
URL : https://hal.archives-ouvertes.fr/hal-02355623

D. Ditter and T. Gerkmann, A multi-phase gammatone filterbank for speech separation via TasNet, ICASSP, pp.36-40, 2020.

M. Ravanelli and Y. Bengio, Speaker recognition from raw waveform with SincNet, in SLT, pp.1021-1028, 2018.

S. Van-der-walt, S. C. Colbert, and G. Varoquaux, The NumPy array: A structure for efficient numerical computation, Computing in Science and Engineering, vol.13, issue.2, pp.22-30, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00564007

D. Griffin and J. Lim, Signal estimation from modified shorttime Fourier transform, IEEE Trans. Acoust., Speech, Signal Process, vol.32, issue.2, pp.236-243, 1984.

N. Perraudin, P. Balazs, and P. Søndergaard, A fast Griffin-Lim algorithm, WASPAA, pp.1-4, 2013.

D. Gunawan and D. Sen, Iterative phase estimation for the synthesis of separated sources from single-channel mixtures, IEEE Signal Process. Letters, vol.17, issue.5, pp.421-424, 2010.

J. L. Roux, S. Wisdom, H. Erdogan, and J. R. Hershey, SDRhalf-baked or well done, ICASSP, pp.626-630, 2019.

J. M. Martín-doñas, A. M. Gomez, J. A. Gonzalez, and A. M. Peinado, A deep learning loss function based on the perceptual evaluation of the speech quality, IEEE Signal Process. Letters, vol.25, issue.11, pp.1680-1684, 2018.

M. Kolbaek, D. Yu, Z. Tan, and J. Jensen, Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks, IEEE/ACM Trans. Audio, Speech, Lang. Process, vol.25, issue.10, pp.1901-1913, 2017.

G. Wichern, J. Antognini, M. Flynn, L. R. Zhu, E. Mcquinn et al., WHAM!: extending speech separation to noisy environments, pp.1368-1372, 2019.

M. Maciejewski, G. Wichern, E. Mcquinn, and J. L. Roux, WHAMR!: Noisy and reverberant single-channel speech separation, ICASSP, pp.696-700, 2020.

J. Cosentino, S. Cornell, M. Pariente, A. Deleforge, and E. Vincent, LibriMix: An open-source dataset for generalizable speech separation, 2020.

S. Wisdom, H. Erdogan, D. P. Ellis, R. Serizel, N. Turpault et al., What's all the fuss about free universal sound separation data?, 2020.

C. K. Reddy, E. Beyrami, H. Dubey, V. Gopal, and R. Cheng, The Interspeech 2020 deep noise suppression challenge: Datasets, subjective speech quality and testing framework, 2020.

L. Drude, J. Heitkaemper, C. Boeddeker, and R. Haeb-umbach, SMS-WSJ: Database, performance measures, and baseline recipe for multi-channel source separation and recognition, 2019.

S. Sivasankaran, E. Vincent, and D. Fohr, Analyzing the impact of speaker localization errors on speech separation for automatic speech recognition, 2020.
URL : https://hal.archives-ouvertes.fr/hal-02355669

Z. Rafii, A. Liutkus, F. Stöter, S. I. Mimilakis, and R. Bittner, The MUSDB18 corpus for music separation, 2017.

W. Falcon, Pytorch lightning, 2019.

L. Drude and R. Haeb-umbach, Tight integration of spatial and spectral features for BSS with deep clustering embeddings, pp.2650-2654, 2017.

E. Vincent, R. Gribonval, and C. Févotte, Performance measurement in blind audio source separation, IEEE/ACM Trans. Audio, Speech, Lang. Process, vol.14, issue.4, pp.1462-1469, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00544230

A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, Perceptual evaluation of speech quality (PESQ) -a new method for speech quality assessment of telephone networks and codecs, ICASSP, vol.2, pp.749-752, 2001.

C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, An algorithm for intelligibility prediction of time-frequency weighted noisy speech, IEEE/ACM Trans. Audio, Speech, Lang. Process, vol.19, issue.7, pp.2125-2136, 2011.

D. Povey, A. Ghoshal, G. Boulianne, L. Burget, and O. Glembek, The Kaldi speech recognition toolkit, 2011.

Z. Wang, J. L. Roux, and J. R. Hershey, Alternative objective functions for deep clustering, in ICASSP, pp.686-690, 2018.