T. Virtanen, M. D. Plumbley, and D. Ellis, Computational analysis of sound scenes and events, 2018.

K. J. Piczak, Environmental sound classification with convolutional neural networks, 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing, pp.1-6, 2015.

B. Mcfee, J. Salamon, and J. P. Bello, Adaptive pooling operators for weakly labeled sound event detection, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.26, issue.11, pp.2180-2193, 2018.

E. , G. Parascandolo, T. Heittola, H. Huttunen, and T. Virtanen, Convolutional recurrent neural networks for polyphonic sound event detection, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue.6, pp.1291-1303, 2017.

W. Wei, H. Zhu, E. Benetos, and Y. Wang, A-crnn: A domain adaptation model for sound event detection, ICASSP 2020 -2020 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.276-280, 2020.

J. Yan, Y. Song, L. Dai, and I. Mcloughlin, Task-aware mean teacher method for large scale weakly labeled semi-supervised sound event detection, ICASSP 2020 -2020 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.326-330, 2020.

Y. Wang, J. Li, and F. Metze, A comparison of five multiple instance learning pooling functions for sound event detection with weak labeling, ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.31-35, 2019.

A. Tarvainen and H. Valpola, Mean teachers are better role models: Weight-averaged consistency targets improve semisupervised deep learning results, Advances in Neural Information Processing Systems, vol.30, pp.1195-1204, 2017.

L. Lin, X. Wang, H. Liu, and Y. Qian, Guided learning for weakly-labeled semi-supervised sound event detection, ICASSP 2020 -2020 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.626-630, 2020.

L. Delphin-poulat and C. Plapous, Mean teacher with data agumentation for dcase 2019 task 4, DCASE 2019 Tech Report, 2019.

Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle et al., Domainadversarial training of neural networks, The Journal of Machine Learning Research, vol.17, issue.1, pp.2096-2030, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01624607

Y. Wang, P. Getreuer, T. Hughes, R. F. Lyon, and R. A. Saurous, Trainable frontend for robust and far-field keyword spotting, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5670-5674, 2017.

N. Turpault, R. Serizel, A. P. Shah, and J. Salamon, Sound event detection in domestic environments with weakly labeled data and soundscape synthesis, Workshop on Detection and Classification of Acoustic Scenes and Events, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02160855

G. Dekkers, S. Lauwereins, B. Thoen, M. W. Adhana, H. Brouckxon et al., The sins database for detection of daily activities in a home environment using an acoustic sensor network, 2017.

A. Mesaros, T. Heittola, and T. Virtanen, TUT database for acoustic scene classification and sound event detection, 24th European Signal Processing Conference, 2016.

S. Wisdom, H. Erdogan, D. P. Ellis, R. Serizel, N. Turpault et al., What's all the fuss about free universal sound separation data?, 2020.

I. Kavalerov, S. Wisdom, H. Erdogan, B. Patton, K. Wilson et al., Universal sound separation, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp.175-179, 2019.

Y. Luo and N. Mesgarani, Conv-tasnet: Surpassing ideal time-frequency magnitude masking for speech separation, IEEE/ACM transactions on audio, speech, and language processing, vol.27, issue.8, pp.1256-1266, 2019.

M. Pariente, S. Cornell, J. Cosentino, S. Sivasankaran, E. Tzinis et al., Asteroid: the pytorch-based audio source separation toolkit for researchers, 2020.
URL : https://hal.archives-ouvertes.fr/hal-02962964

I. Goodfellow, J. Pouget-abadie, M. Mirza, B. Xu, D. Warde-farley et al., Generative adversarial nets, Advances in neural information processing systems, pp.2672-2680, 2014.

D. S. Park, W. Chan, Y. Zhang, C. Chiu, B. Zoph et al., Specaugment: A simple data augmentation method for automatic speech recognition, 2019.

V. Lostanlen, J. Salamon, M. Cartwright, B. Mcfee, A. Farnsworth et al., Per-channel energy normalization: Why and how, IEEE Signal Processing Letters, vol.26, issue.1, pp.39-43, 2018.

M. Olvera, E. Vincent, R. Serizel, and G. Gasso, Foregroundbackground ambient sound scene separation, 2020.
URL : https://hal.archives-ouvertes.fr/hal-02567542

T. Hayashi, S. Watanabe, T. Toda, T. Hori, J. L. Roux et al., Bidirectional lstm-hmm hybrid system for polyphonic sound event detection, Proceedings of the Detection and Classification of Acoustic Scenes and Events, pp.35-39, 2016.