J. P. Bello, C. Silva, O. Nov, R. L. Dubois, A. Arora et al., SONYC: A system for the monitoring, analysis and mitigation of urban noise pollution, Communications of the ACM, 2018.

R. Radhakrishnan, A. Divakaran, and A. Smaragdis, Audio analysis for surveillance applications, Proc. WASPAA. IEEE, pp.158-161, 2005.

R. Serizel, V. Bisot, S. Essid, and G. Richard, Machine listening techniques as a complement to video image analysis in forensics, IEEE International Conference on Image Processing, pp.948-952, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01393959

Q. Jin, P. Schulam, S. Rawat, S. Burger, D. Ding et al., Event-based video retrieval using audio, Proc. Interspeech, 2012.

C. Debes, A. Merentitis, S. Sukhanov, M. Niessen, N. Frangiadakis et al., Monitoring activities of daily living in smart homes: Understanding human behavior, IEEE Signal Processing Magazine, vol.33, issue.2, pp.81-94, 2016.

Y. Zigel, D. Litvak, and I. Gannot, A method for automatic fall detection of elderly people using floor vibrations and soundproof of concept on human mimicking doll falls, IEEE Transactions on Biomedical Engineering, vol.56, issue.12, pp.2858-2867, 2009.

R. Serizel, N. Turpault, H. Eghbal-zadeh, and A. Shah, Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments, Workshop on Detection and Classification of Acoustic Scenes and Events, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01850270

A. Shah, A. Kumar, A. G. Hauptmann, and B. Raj, A closer look at weak label learning for audio events
URL : https://hal.archives-ouvertes.fr/hal-01839252

N. Turpault, R. Serizel, and E. Vincent, Limitations of weak labels for embedding and tagging, ICASSP 2020 -45th International Conference on Acoustics, Speech, and Signal Processing, 2020.
URL : https://hal.archives-ouvertes.fr/hal-02467401

B. Mcfee, J. Salamon, and J. P. Bello, Adaptive pooling operators for weakly labeled sound event detection, IEEE/ACM Trans. on Audio, Speech, and Language Processing, vol.26, issue.11, pp.2180-2193, 2018.

J. Salamon, D. Macconnell, M. Cartwright, P. Li, and J. P. Bello, Scaper: A library for soundscape synthesis and augmentation, Proc. WASPAA, pp.344-348, 2017.

N. Turpault, R. Serizel, A. P. Shah, and J. Salamon, Sound event detection in domestic environments with weakly labeled data and soundscape synthesis, Proc. DCASE Workshop, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02160855

L. Jiakai, Mean teacher convolution system for dcase 2018 task 4," DCASE2018 Challenge, 2018.

L. Delphin-poulat and C. Plapous, Mean teacher with data augmentation for dcase 2019 task 4, Orange Labs Lannion, 2019.

A. Tarvainen and H. Valpola, Mean teachers are better role models: Weight-averaged consistency targets improve semisupervised deep learning results, Proc. NIPS, pp.1195-1204, 2017.

R. Serizel, N. Turpault, A. Shah, and J. Salamon, Sound event detection in synthetic domestic environments, Proc. ICASSP, 2020.
URL : https://hal.archives-ouvertes.fr/hal-02355573

J. F. Gemmeke, D. P. Ellis, D. Freedman, A. Jansen, W. Lawrence et al., Audio set: An ontology and human-labeled dataset for audio events, Proc. ICASSP, 2017.

F. Font, G. Roma, and X. Serra, Freesound technical demo, Proc. ACM, pp.411-412, 2013.

E. Fonseca, X. Favory, J. Pons, F. Font, and X. Serra, FSD50k: an open dataset of human-labeled sound events, 2020.

G. Dekkers, S. Lauwereins, B. Thoen, M. W. Adhana, H. Brouckxon et al., The SINS database for detection of daily activities in a home environment using an acoustic sensor network, Proc. DCASE Workshop, pp.32-36, 2017.

A. Mesaros, T. Heittola, and T. Virtanen, TUT database for acoustic scene classification and sound event detection, 2016 24th European Signal Processing Conference (EUSIPCO). IEEE, pp.1128-1132

, Metrics for polyphonic sound event detection, Applied Sciences, vol.6, issue.6, 2016.

C. Bilen, G. Ferroni, F. Tuveri, J. Azcarreta, and S. Krstulovic, A framework for the robust evaluation of sound event detection, Proc. ICASSP, 2020.

S. Wisdom, H. Erdogan, D. P. Ellis, and J. R. , Hershey, Free Universal Sound Separation (FUSS) dataset, 2020.

P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski et al., Accurate, large minibatch sgd: Training imagenet in 1 hour, 2017.