T. Virtanen, M. Plumbley, and D. Ellis, Computational Analysis of Sound Scenes and Events, 2017.

S. Pascual, M. Ravanelli, J. Serr, A. Bonafonte, and Y. Bengio, Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks, Proc. Interspeech, pp.161-165, 2019.

L. Van-der-maaten and K. Weinberger, Stochastic triplet embedding, IEEE International Workshop on Machine Learning for Signal Processing, pp.1-6, 2012.

Y. Xu, Q. Huang, W. Wang, P. Foster, S. Sigtia et al., Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue.6, pp.1230-1241, 2017.

M. Cartwright, J. Cramer, J. Salamon, and J. P. Bello, TRICYCLE: Audio representation learning from sensor network data using self-supervision, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, p.5, 2019.

J. Pons, J. Serr, and X. Serra, Training Neural Audio Classifiers with Few Data, Proc. ICASSP, pp.16-20, 2019.

J. Cramer, H. Wu, J. Salamon, and J. P. Bello, Look, Listen, and Learn More: Design Choices for Deep Audio Embeddings, Proc. ICASSP, pp.1520-6149, 2019.

S. Hershey, S. Chaudhuri, P. W. Daniel, J. F. Ellis, A. Gemmeke et al., CNN Architectures for Large-Scale Audio Classification, Proc. ICASSP, 2017.

Q. Kilian, J. Weinberger, L. Blitzer, and . Saul, Distance Metric Learning for Large Margin Nearest Neighbor Classification, Journal of Machine Learning Research, pp.207-244, 2009.

J. Snell, K. Swersky, and R. Zemel, Prototypical Networks for Few-shot Learning, Advances in Neural Information Processing Systems, 2017.

Y. Tokozume, Y. Ushiku, and T. Harada, Learning from between-class examples for deep sound recognition, in ICLR, p.13, 2018.

Z. Lu, Z. Fu, T. Xiang, P. Han, L. Wang et al., Learning from Weak and Noisy Labels for Semantic Segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, issue.3, pp.486-500, 2017.

J. Schlter, Learning to Pinpoint Singing Voice from Weakly Labeled Examples.," in ISMIR, pp.44-50, 2016.

A. Kumar and B. Raj, Audio Event Detection Using Weakly Labeled Data, Proceedings of the 24th ACM International Conference on Multimedia, pp.1038-1047, 2016.

B. Mcfee, J. Salamon, and J. P. Bello, Adaptive pooling operators for weakly labeled sound event detection, IEEE TRANSACTIONS ON AUDIO, p.14, 2018.

R. Serizel and N. Turpault, Sound Event Detection from Partially Annotated Data: Trends and Challenges, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02114652

A. Shah, A. Kumar, A. G. Hauptmann, and B. Raj, A Closer Look at Weak Label Learning for Audio Events, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01839252

L. Lin and X. Wang, Guided learning convolution system for DCASE 2019 task 4, Chinese Academy of Sciences, Beijing, 2019.

L. Delphin, -. Poulat, and C. Plapous, Mean teacher with data augmentation for DCASE 2019 task 4, 2019.

F. Jort, . Gemmeke, P. W. Daniel, D. Ellis, A. Freedman et al., Audio Set: An ontology and humanlabeled dataset for audio events, Proc. ICASSP, 2017.

J. Salamon, C. Jacoby, and J. P. Bello, A Dataset and Taxonomy for Urban Sound Research, pp.1041-1044, 2014.

F. Font, G. Roma, and X. Serra, Freesound Technical Demo, ACM International Conference on Multimedia (MM13), pp.411-412, 2013.

N. Turpault, R. Serizel, A. Shah, and J. Salamon, Sound event detection in domestic environments with weakly labeled data and soundscape synthesis, Accepted to DCASE2019 Workshop, vol.17, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02160855

J. Wang, Y. Song, T. Leung, C. Rosenberg, J. Wang et al., Learning Fine-Grained Image Similarity with Deep Ranking, Proc. CVPR, pp.1386-1393, 2014.

E. Fonseca, J. Pons, X. Favory, F. Font, D. Bogdanov et al., Freesound datasets: a platform for the creation of open audio datasets, Proc. ISMIR, pp.486-493, 2017.

F. Font, G. Roma, and X. Serra, Freesound technical demo, Proc. ACMM, pp.411-412, 2013.

G. Dekkers, S. Lauwereins, B. Thoen, H. Mulu-weldegebreal-adhana, . Brouckxon et al., The SINS database for detection of daily activities in a home environment using an acoustic sensor network, Proc. DCASE Workshop, pp.32-36, 2017.

D. Snyder, G. Chen, and D. Povey, MU-SAN: A Music, Speech, and Noise Corpus, 2015.

J. Salamon, D. Macconnell, M. Cartwright, P. Li, and J. P. Bello, Scaper: A library for soundscape synthesis and augmentation, Proc. WASPAA. IEEE, pp.344-348, 2017.

L. Li, K. Jamieson, A. Rostamizadeh, E. Gonina, M. Hardt et al., Massively Parallel Hyperparameter Tuning, 2018.