L. Jiang, S. Yu, D. Meng, Y. Yang, T. Mitamura et al., Fast and Accurate Content-based Semantic Search in 100M Internet Videos, Proceedings of the 23rd ACM international conference on Multimedia, MM '15, 2015.
DOI : 10.1145/2733373.2806237

S. Chu, S. Narayanan, C. J. Kuo, and M. Mataric, Where am I? Scene Recognition for Mobile Robots using Audio Features, 2006 IEEE International Conference on Multimedia and Expo, pp.885-888, 2006.
DOI : 10.1109/ICME.2006.262661

M. Janvier, X. Alameda-pineda, L. Girin, and R. Horaud, Sound-event recognition with a companion humanoid, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012), pp.104-111, 2012.
DOI : 10.1109/HUMANOIDS.2012.6651506

URL : https://hal.archives-ouvertes.fr/hal-00768767

A. Mesaros, T. Heittola, and T. Virtanen, TUT database for acoustic scene classification and sound event detection, 2016 24th European Signal Processing Conference (EUSIPCO), 2016.
DOI : 10.1109/EUSIPCO.2016.7760424

G. Roma, W. Nogueira, P. Herrera, and R. De-boronat, Recurrence quantification analysis features for auditory scene classification, IEEE AASP, 2013.

A. Rakotomamonjy and G. Gasso, Histogram of gradients of time-frequency representations for audio scene classification, IEEE/ACM Transactions on Audio, Speech and Language Processing, vol.23, issue.1, pp.142-153, 2015.

J. Schröder, N. Moritz, M. R. Schädler, B. Cauchi, and K. Adiloglu, On the use of spectro-temporal features for the IEEE AASP challenge ‘detection and classification of acoustic scenes and events’, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013.
DOI : 10.1109/WASPAA.2013.6701868

J. F. Gemmeke, L. Vuegen, P. Karsmakers, and B. Vanrumste, An exemplar-based NMF approach to audio event detection, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp.1-4, 2013.
DOI : 10.1109/WASPAA.2013.6701847

F. Li and P. Perona, The perceived position of moving objects: Transcranial magnetic stimulation of area MT+ reduces the flash-lag effect, IEEE CVPR, 2005.

J. Gauvain and C. Lee, Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains, Speech and audio processing, 1994.
DOI : 10.1109/89.279278

F. Bimbot, A Tutorial on Text-Independent Speaker Verification, EURASIP Journal on Advances in Signal Processing, vol.2004, issue.4, pp.430-451, 2004.
DOI : 10.1155/S1110865704310024

URL : https://hal.archives-ouvertes.fr/hal-01434501

J. Zhang, M. Marsza?ek, S. Lazebnik, and C. Schmid, Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study, International Journal of Computer Vision, vol.36, issue.1, pp.213-238, 2007.
DOI : 10.1007/s11263-006-9794-4

URL : https://hal.archives-ouvertes.fr/inria-00548574

A. Vedaldi and A. Zisserman, Efficient Additive Kernels via Explicit Feature Maps, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.3, pp.480-492, 2012.
DOI : 10.1109/TPAMI.2011.153

P. Li, G. Samorodnitsk, and J. Hopcroft, Sign cauchy projections and chi-square kernel, Advances in Neural Information Processing Systems, pp.2571-2579, 2013.

C. and C. Lin, LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology, vol.2, issue.27, pp.1-2727, 2011.

R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin, Liblinear: A library for large linear classification, The Journal of Machine Learning Research, 2008.

B. Elizalde, M. Ravanelli, K. Ni, D. Borth, and G. Friedland, Audio-concept features and hidden markov models for multimedia event detection

B. Elizalde and G. Friedland, Lost in segmentation: Three approaches for speech/non-speech detection in consumer-produced videos, 2013 IEEE International Conference on Multimedia and Expo (ICME), pp.1-6, 2013.
DOI : 10.1109/ICME.2013.6607486

B. Elizalde, G. Friedland, H. Lei, and A. Divakaran, There is no data like less data, Proceedings of the 2012 ACM international workshop on Audio and multimedia methods for large-scale video analysis, AMVA '12, 2012.
DOI : 10.1145/2390214.2390223

M. R. Schädler, B. T. Meyer, and B. Kollmeier, Spectrotemporal modulation subspace-spanning filter bank features for robust automatic speech recognition, The Journal of the Acoustical Society of America, pp.4134-4151, 2012.

M. R. Schädler and B. Kollmeier, Separable spectrotemporal gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition, The Journal of the Acoustical Society of America, 2015.

L. Sifre and S. Mallat, Rotation, Scaling and Deformation Invariant Scattering for Texture Discrimination, 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp.1233-1240, 2013.
DOI : 10.1109/CVPR.2013.163

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion et al., Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, vol.12, pp.2825-2830, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00650905

R. S. Olson, R. J. Urbanowicz, and P. C. Andrews, Automating Biomedical Data Science Through Tree-Based Pipeline Optimization, Proceedings of the 18th European Conference on the Applications of Evolutionary and Bio-inspired Computation , ser. Lecture Notes in Computer Science, 2016.
DOI : 10.1007/978-3-319-31204-0_9

URL : http://arxiv.org/abs/1601.07925

A. Kumar and B. Raj, Audio Event Detection using Weakly Labeled Data, Proceedings of the 2016 ACM on Multimedia Conference, MM '16, 2016.
DOI : 10.1145/2964284.2964310

J. Chen, Y. Wang, and D. Wang, Noise Perturbation Improves Supervised Speech Separation, International Conference on Latent Variable Analysis and Signal Separation, pp.83-90, 2015.
DOI : 10.1007/978-3-319-22482-4_10

N. Kanda, R. Takeda, and Y. Obuchi, Elastic spectral distortion for low resource speech recognition with deep neural networks, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp.309-314, 2013.
DOI : 10.1109/ASRU.2013.6707748