Z. Akata, F. Perronnin, Z. Harchaoui, and C. Schmid, Label-Embedding for Attribute-Based Classification, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.111

URL : https://hal.archives-ouvertes.fr/hal-00815747

R. Aly, R. Arandjelovic, K. Chatfield, M. Douze, B. Fernando et al., The AXES submissions at TrecVid 2013, TRECVID Workshop, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00904404

J. Andén and S. Mallat, Multiscale scattering for audio classification, ISMIR, 2011.

J. Anden and S. Mallat, ScatNet (v0.2), 2013.

C. Chang and C. Lin, LIBSVM, ACM Transactions on Intelligent Systems and Technology, vol.2, issue.3, pp.27-28, 2011.
DOI : 10.1145/1961189.1961199

S. Clinchant, J. Renders, and G. Csurka, Transmedia pseudo-relevance feedback methods in multimedia retrieval, Advances in Multilingual and Multimodal Information Retrieval, 2008.

N. Dalal, B. Triggs, and C. Schmid, Human Detection Using Oriented Histograms of Flow and Appearance, ECCV, 2006.
DOI : 10.1023/A:1008162616689

URL : https://hal.archives-ouvertes.fr/inria-00548587

P. Fousek, L. Lamel, and J. Gauvain, On the Use of MLP Features for??Broadcast??News??Transcription, Text, Speech and Dialogue, pp.303-310, 2008.
DOI : 10.1007/978-3-540-87391-4_39

J. Gauvain, L. Lamel, and G. Adda, Partitioning and transcription of broadcast news data, ICSLP, vol.98, issue.5, pp.1335-1338, 1998.

H. Hermansky, Perceptual linear predictive (PLP) analysis of speech, The Journal of the Acoustical Society of America, vol.87, issue.4, pp.1738-1752, 1990.
DOI : 10.1121/1.399423

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long et al., Caffe, Proceedings of the ACM International Conference on Multimedia, MM '14, 2014.
DOI : 10.1145/2647868.2654889

J. Krapac, J. Verbeek, and F. Jurie, Modeling spatial layout with fisher vectors for image categorization, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126406

URL : https://hal.archives-ouvertes.fr/inria-00612277

H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, HMDB: A large video database for human motion recognition, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126543

L. Lamel and J. Gauvain, Speech Processing for Audio Indexing, Advances in Natural Language Processing, 2008.
DOI : 10.1109/TSA.1996.481450

L. Lamel and J. Gauvain, Speech Processing for Audio Indexing, Proceedings of the 6th International Conference on Natural Language Processing, GoTAL 2008 -Advances in Natural Language Processing, pp.4-15, 2008.
DOI : 10.1109/TSA.1996.481450

D. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, vol.60, issue.2, pp.91-110, 2004.
DOI : 10.1023/B:VISI.0000029664.99615.94

J. Matas, O. Chum, M. Urban, and T. Pajdla, Robust wide baseline stereo from maximally stable extremal regions, BMVC, 2002.

D. Oneata, J. Verbeek, and C. Schmid, Action and Event Recognition with Fisher Vectors on a Compact Feature Set, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.228

URL : https://hal.archives-ouvertes.fr/hal-00873662

I. Oparin, L. Lamel, and J. Gauvain, Rapid development of a Latvian speech-to-text system, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013.
DOI : 10.1109/ICASSP.2013.6639082

P. Over, G. Awad, M. Michel, J. Fiscus, G. Sanders et al., Trecvid 2014 ? an overview of the goals, tasks, data, evaluation mechanisms and metrics, Proceedings of TRECVID 2014, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01230444

D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek et al., The Kaldi Speech Recognition Toolkit, Proc. Workshop on Automatic Speech Recognition & Understanding (ASRU), pp.1-4, 2011.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., ImageNet Large Scale Visual Recognition Challenge, International Journal of Computer Vision, vol.1010, issue.1, 2014.
DOI : 10.1007/s11263-015-0816-y

J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek, Image Classification with the Fisher Vector: Theory and Practice, International Journal of Computer Vision, vol.73, issue.2, pp.222-245, 2013.
DOI : 10.1007/s11263-013-0636-x

P. Schwarz, P. Mat?, and J. Cernock´ycernock´y, Towards Lower Error Rates in Phoneme Recognition, Text, Speech and Dialogue, pp.465-472, 2004.
DOI : 10.1007/978-3-540-30120-2_59

H. Wang and C. Schmid, Action Recognition with Improved Trajectories, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.441

URL : https://hal.archives-ouvertes.fr/hal-00873267

Q. Zhu, A. Stolcke, B. Y. Chen, and N. Morgan, Using MLP features in SRI's conversational speech recognition system, Interspeech, pp.2141-2144, 2005.