M. Bugalho, J. Portelo, I. Trancoso, T. Pellegrini, and A. Abad, Detecting audio events for semantic video search, Interspeech, pp.1151-1154, 2009.

A. J. Eronen, V. T. Peltonen, J. T. Tuomi, A. P. Klapuri, S. Fagerlund et al., Audio-based context recognition, IEEE Transactions on Audio, Speech and Language Processing, vol.14, issue.1, pp.321-329, 2006.
DOI : 10.1109/TSA.2005.854103

D. Stowell and D. Clayton, Acoustic event detection for multiple overlapping similar sources, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2015.
DOI : 10.1109/WASPAA.2015.7336885

URL : http://arxiv.org/abs/1503.07150

S. Goetze, J. Schröder, S. Gerlach, D. Hollosi, J. Appell et al., Acoustic Monitoring and Localization for Social Care, Journal of Computing Science and Engineering, vol.6, issue.1, pp.40-50, 2012.
DOI : 10.5626/JCSE.2012.6.1.40

D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, and M. D. Plumbley, Detection and Classification of Acoustic Scenes and Events, IEEE Transactions on Multimedia, vol.17, issue.10, pp.1733-1746, 2015.
DOI : 10.1109/TMM.2015.2428998

URL : https://hal.archives-ouvertes.fr/hal-01123760

D. Barchiesi, D. Giannoulis, D. Stowell, and M. Plumbley, Acoustic Scene Classification: Classifying environments from the sounds they produce, IEEE Signal Processing Magazine, vol.32, issue.3, pp.16-34, 2015.
DOI : 10.1109/MSP.2014.2326181

J. Aucouturier, B. Defreville, and F. Pachet, The bag-of-frames approach to audio pattern recognition: A sufficient model for urban soundscapes but not for polyphonic music, The Journal of the Acoustical Society of America, vol.122, issue.2, pp.881-891, 2007.
DOI : 10.1121/1.2750160

T. Heittola, A. Mesaros, A. Eronen, and T. Virtanen, Audio context recognition using audio event histograms, 18th European Signal Processing Conference, pp.1272-1276, 2010.

A. Rakotomamonjy and G. Gasso, Histogram of gradients of time-frequency representations for audio scene classification, IEEE/ACM Trans. Audio, Speech and Lang. Proc, vol.23, issue.1, pp.142-153, 2015.

B. Elizalde, A. Kumar, A. Shah, R. Badlani, E. Vincent et al., Experiments on the DCASE challenge 2016: Acoustic scene classification and sound event detection in real life recording, DCASE2016 Workshop on Detection and Classification of Acoustic Scenes and Events, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01354007

A. Mesaros, T. Heittola, A. Eronen, and T. Virtanen, Acoustic event detection in real-life recordings, 18th European Signal Processing Conference, pp.1267-1271, 2010.

J. Salamon, C. Jacoby, J. P. Bello, E. Cak?r, G. Parascandolo et al., A dataset and taxonomy for urban sound research Convolutional recurrent neural networks for polyphonic sound event detection, 22st ACM International Conference on Multimedia (ACM-MM'14), pp.1291-1303, 2014.

J. Gemmeke, L. Vuegen, P. Karsmakers, B. Vanrumste, and H. Van-hamme, An exemplar-based NMF approach to audio event detection, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp.1-4, 2013.
DOI : 10.1109/WASPAA.2013.6701847

A. Mesaros, O. Dikmen, T. Heittola, and T. Virtanen, Sound event detection in real life recordings using coupled matrix factorization of spectral representations and class activity annotations, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.2015-151
DOI : 10.1109/ICASSP.2015.7177950

E. Marchi, F. Vesperini, F. Eyben, S. Squartini, and B. Schuller, A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional LSTM neural networks, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.
DOI : 10.1109/ICASSP.2015.7178320

B. Byun, I. Kim, S. M. Siniscalchi, and C. Lee, Consumer-level multimedia event detection through unsupervised audio signal modeling, INTERSPEECH, 2012, pp.2081-2084

B. Elizalde, G. Friedland, H. Lei, and A. Divakaran, There is no data like less data, Proceedings of the 2012 ACM international workshop on Audio and multimedia methods for large-scale video analysis, AMVA '12, pp.27-32, 2012.
DOI : 10.1145/2390214.2390223

Y. Xu, Q. Huang, W. Wang, P. Foster, S. Sigtia et al., Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue.6, pp.1230-1241, 2017.
DOI : 10.1109/TASLP.2017.2690563

W. Han, E. Coutinho, H. Ruan, H. Li, B. Schuller et al., Semi-Supervised Active Learning for Sound Classification in Hybrid Learning Environments, PLOS ONE, vol.57, issue.12, p.162075, 2016.
DOI : 10.1371/journal.pone.0162075.t009

A. Shah, R. Badlani, A. Kumar, B. Elizalde, and B. Raj, An approach for self-training audio event detectors using web data, 2016.

A. Kumar and B. Raj, Weakly supervised scalable audio content analysis, 2016 IEEE International Conference on Multimedia and Expo (ICME), 2016.
DOI : 10.1109/ICME.2016.7552989

URL : http://arxiv.org/pdf/1606.03664

T. Heittola, A. Diment, and A. Mesaros, DCASE2017 baseline system, " https://github.com/TUT-ARG/ DCASE2017-baseline-system, 2017.

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, Proceedings of the 3rd International Conference on Learning Representations (ICLR), 2014.

A. Mesaros, T. Heittola, and T. Virtanen, TUT database for acoustic scene classification and sound event detection, 2016 24th European Signal Processing Conference (EUSIPCO), 2016.
DOI : 10.1109/EUSIPCO.2016.7760424

T. Giannakopoulos, pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis, PLOS ONE, vol.7, issue.1, 2015.
DOI : 10.1371/journal.pone.0144610.t005

URL : https://doi.org/10.1371/journal.pone.0144610

G. Forman and M. Scholz, Apples-to-apples in cross-validation studies, ACM SIGKDD Explorations Newsletter, vol.12, issue.1, pp.49-57
DOI : 10.1145/1882471.1882479

J. F. Gemmeke, D. P. Ellis, D. Freedman, A. Jansen, W. Lawrence et al., Audio Set: An ontology and human-labeled dataset for audio events, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.
DOI : 10.1109/ICASSP.2017.7952261