D. Barchiesi, D. Giannoulis, D. Stowel, and M. D. Plumbley, Acoustic Scene Classification: Classifying environments from the sounds they produce, IEEE Signal Processing Magazine, vol.32, issue.3, pp.16-34, 2015.
DOI : 10.1109/MSP.2014.2326181

A. Mesaros, T. Heittola, and T. Virtanen, TUT database for acoustic scene classification and sound event detection, 2016 24th European Signal Processing Conference (EUSIPCO), 2016.
DOI : 10.1109/EUSIPCO.2016.7760424

A. J. Eronen, V. T. Peltonen, J. T. Tuomi, A. P. Klapuri, S. Fagerlund et al., Audio-based context recognition, IEEE Transactions on Audio, Speech and Language Processing, vol.14, issue.1, pp.321-329, 2006.
DOI : 10.1109/TSA.2005.854103

X. Valero and F. Alías, Gammatone Cepstral Coefficients: Biologically Inspired Features for Non-Speech Audio Classification, IEEE Transactions on Multimedia, vol.14, issue.6, pp.1684-1689, 2012.
DOI : 10.1109/TMM.2012.2199972

G. Roma, W. Nogueira, and P. Herrera, Recurrence quantification analysis features for environmental sound recognition, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013.
DOI : 10.1109/WASPAA.2013.6701890

A. Rakotomamonjy and G. Gasso, Histogram of gradients of time-frequency representations for audio scene classification, IEEE/ACM Transactions on Audio, Speech and Language Processing, vol.23, issue.1, pp.142-153, 2015.

D. Battaglino, L. Lepauloux, L. Pilati, and N. Evansi, Acoustic context recognition using local binary pattern codebooks, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp.1-5, 2015.
DOI : 10.1109/WASPAA.2015.7336886

V. Bisot, R. Serizel, S. Essid, and G. Richard, Feature Learning With Matrix Factorization Applied to Acoustic Scene Classification, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue.6, pp.1216-1229, 2017.
DOI : 10.1109/TASLP.2017.2690570

URL : https://hal.archives-ouvertes.fr/hal-01362864

J. Salamon and J. P. Bello, Unsupervised feature learning for urban sound classification, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.
DOI : 10.1109/ICASSP.2015.7177954

A. Rakotomamonjy, Supervised Representation Learning for Audio Scene Classification, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue.6, pp.1253-1265, 2017.
DOI : 10.1109/TASLP.2017.2690561

URL : https://hal.archives-ouvertes.fr/hal-01354115

V. Bisot, R. Serizel, S. Essid, and G. Richard, Supervised nonnegative matrix factorization for acoustic scene classification, Challenge Tech. Rep, 2016.

J. Li, W. Dai, F. Metze, S. Qu, and S. Das, A comparison of Deep Learning methods for environmental sound detection, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.
DOI : 10.1109/ICASSP.2017.7952131

H. Eghbal-zadeh, B. Lehner, M. Dorfer, G. Widmer, R. Maass et al., CP- JKU submissions for DCASE-2016: a hybrid approach using binaural i-vectors and deep convolutional neural networks Audio scene classification with deep recurrent neural networks, DCASE2016 Challenge, Tech. Rep, 2016.

A. Mesaros, T. Heittola, A. Diment, B. Elizalde, A. Shah et al., DCASE 2017 challenge setup: Tasks, datasets and baseline system, Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017), pp.sub- mitted, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01627981

V. Bisot, R. Serizel, S. Essid, and G. Richard, Leveraging deep neural networks with nonnegative representations for improved environmental sound classification, Proc. of Workshop on Machine Learning for Signal Processing, 2017.

D. D. Lee and H. S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature, vol.401, issue.6755, pp.788-791, 1999.

J. Mairal, F. Bach, and J. Ponce, Task-Driven Dictionary Learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.4, pp.791-804, 2012.
DOI : 10.1109/TPAMI.2011.156

URL : https://hal.archives-ouvertes.fr/inria-00521534

P. Sprechmann, A. M. Bronstein, and G. Sapiro, Supervised non-euclidean sparse NMF via bilevel optimization with applications to speech enhancement, 2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA), pp.11-15, 2014.
DOI : 10.1109/HSCMA.2014.6843241

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, International Conference on Machine Learning, pp.448-456, 2015.

B. Mathieu, S. Essid, T. Fillon, J. Prado, and G. Richard, Yaafe, an easy to use and efficient audio feature extraction software, Proc. of International Society for Music Information Retrieval, pp.441-446, 2010.

R. Serizel, S. Essid, and G. Richard, Mini-batch stochastic approaches for accelerated multiplicative updates in nonnegative matrix factorisation with beta-divergence, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), pp.2016-2042, 2016.
DOI : 10.1109/MLSP.2016.7738818

URL : https://hal.archives-ouvertes.fr/hal-01393964

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion et al., Scikit-learn: Machine learning in python, The Journal of Machine Learning Research, vol.12, pp.2825-2830, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00650905

K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. Lecun, What is the best multi-stage architecture for object recognition?, 2009 IEEE 12th International Conference on Computer Vision, pp.2146-2153, 2009.
DOI : 10.1109/ICCV.2009.5459469

G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov, Improving neural networks by preventing co-adaptation of feature detectors, 2012.