M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen et al., Tensorflow: Large-scale machine learning on heterogeneous distributed systems, 2016.

K. Choi, G. Fazekas, and M. Sandler, Explaining deep convolutional neural networks on music classification, 2016.

G. E. Dahl, D. Yu, L. Deng, and A. Acero, Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.1, pp.30-42, 2012.

T. Giannakopoulos, pyaudioanalysis: An open-source python library for audio signal analysis, PloS one, vol.10, issue.12, p.144610, 2015.

T. Giannakopoulos and A. Pikrakis, Introduction to Audio Analysis: A MATLAB R Approach, 2014.

T. Giannakopoulos, G. Siantikos, S. Perantonis, N. E. Votsi, and J. Pantis, Automatic soundscape quality estimation using audio analysis, Proceedings of the 8th ACM International Conference on PErvasive Technologies Related to Assistive Environments, p.19, 2015.

T. Grill and J. Schluter, Music boundary detection using neural networks on spectrograms and self-similarity lag matrices, Signal Processing Conference, pp.1296-1300, 2015.

G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. R. Mohamed et al., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Processing Magazine, vol.29, issue.6, pp.82-97, 2012.

Z. Huang, M. Dong, Q. Mao, and Y. Zhan, Speech emotion recognition using cnn, Proceedings of the 22nd ACM international conference on Multimedia, pp.801-804, 2014.

M. Huzaifah, Comparison of time-frequency representations for environmental sound classification using convolutional neural networks, 2017.

K. Hyoung-gook, M. Nicolas, and T. Sikora, MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval, 2005.

P. Khunarsal, C. Lursinsap, and T. Raicharoen, Very short time environmental sound classification based on spectrogram pattern matching, Information Sciences, vol.243, pp.57-74, 2013.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems, pp.1097-1105, 2012.

L. I. Kuncheva and C. J. Whitaker, Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy, Machine learning, vol.51, issue.2, pp.181-207, 2003.

H. Lee, P. Pham, Y. Largman, and A. Y. Ng, Unsupervised feature learning for audio classification using convolutional deep belief networks, Advances in neural information processing systems, pp.1096-1104, 2009.

A. Mesaros, T. Heittola, A. Diment, B. Elizalde, A. Shah et al., Dcase 2017 challenge setup: Tasks, datasets and baseline system, DCASE 2017-Workshop on Detection and Classification of Acoustic Scenes and Events, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01627981

A. Mesaros, T. Heittola, and T. Virtanen, Tut database for acoustic scene classification and sound event detection, Signal Processing Conference (EUSIPCO), pp.1128-1132, 2016.

K. J. Piczak, Environmental sound classification with convolutional neural networks, 2015 IEEE 25th International Workshop on, pp.1-6, 2015.

K. J. Piczak, Esc: Dataset for environmental sound classification, Proceedings of the 23rd ACM international conference on Multimedia, pp.1015-1018, 2015.

J. Salamon and J. P. Bello, Deep convolutional neural networks and data augmentation for environmental sound classification, IEEE Signal Processing Letters, vol.24, issue.3, pp.279-283, 2017.

J. Salamon, C. Jacoby, and J. P. Bello, A dataset and taxonomy for urban sound research, Proceedings of the 22nd ACM international conference on Multimedia, pp.1041-1044, 2014.

S. Scardapane, D. Comminiello, M. Scarpiniti, and A. Uncini, Music classification using extreme learning machines, 2013 8th International Symposium on Image and Signal Processing and Analysis (ISPA), pp.377-381, 2013.

J. Schlüter and S. Böck, Cnn-based audio onset detection mirex submission

J. Schlüter and S. Böck, Musical onset detection with convolutional neural networks, 6th International Workshop on Machine Learning and Music (MML), 2013.

J. Schmidhuber, Deep learning in neural networks: An overview, Neural networks, vol.61, pp.85-117, 2015.

N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, vol.15, issue.1, pp.1929-1958, 2014.

A. Subramaniam, V. Patel, A. Mishra, P. Balasubramanian, and A. Mittal, Bimodal first impressions recognition using temporally ordered deep audio and stochastic visual features, European Conference on Computer Vision, pp.337-348, 2016.

S. Theodoridis and K. Koutroumbas, Pattern Recognition, Fourth Edition, 2008.

M. Thorogood, J. Fan, and P. Pasquier, Soundscape audio signal classification and segmentation using listeners perception of background and foreground sound, Journal of the Audio Engineering Society, vol.64, issue.7/8, pp.484-492, 2016.

J. Ye, T. Kobayashi, and M. Murakawa, Urban sound event classification based on local and global features aggregation, Applied Acoustics, vol.117, pp.246-256, 2017.

C. Zhang, G. Evangelopoulos, S. Voinea, L. Rosasco, and T. Poggio, A deep representation for invariance and music classification, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.6984-6988, 2014.