M. Campr and K. Je?ek, Comparing Semantic Models for Evaluating Automatic Document Summarization, Text, Speech, and Dialogue, 2015.
DOI : 10.1007/978-3-319-24033-6_29

M. Cha, Y. Gwon, and H. T. Kung, Multimodal sparse representation learning and applications. CoRR, abs, 1511.

F. Feng, X. Wang, and R. Li, Cross-modal retrieval with correspondence autoencoder, ACM Intl. Conf. on Multimedia, pp.7-16, 2014.

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, In Aistats, vol.9, pp.249-256, 2010.

C. Guinaudeau, A. R. Simon, G. Gravier, and P. Sébillot, HITS and IRISA at MediaEval 2013: Search and hyperlinking task, Working Notes MediaEval Workshop, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00906249

G. E. Hinton and R. R. Salakhutdinov, Reducing the Dimensionality of Data with Neural Networks, Science, vol.313, issue.5786, pp.313504-507, 2006.
DOI : 10.1126/science.1127647

L. Jiang, S. Yu, D. Meng, Y. Yang, T. Mitamura et al., Fast and Accurate Content-based Semantic Search in 100M Internet Videos, Proceedings of the 23rd ACM international conference on Multimedia, MM '15, pp.49-58, 2015.
DOI : 10.1145/2733373.2806237

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems, pp.1097-1105, 2012.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems, pp.1097-1105, 2012.

Q. V. Le and T. Mikolov, Distributed representations of sentences and documents, ICML, pp.1188-1196, 2014.

H. Lu, Y. Liou, H. Lee, and L. Lee, Semantic retrieval of personal photos using a deep autoencoder fusing visual features with speech annotations represented as word/paragraph vectors, Annual Conf. of the Intl. Speech Communication Association, 2015.

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality, Advances in Neural Information Processing Systems Multimodal deep learning. In Intl. Conf. on Machine Learning, 2011.

P. Over, J. Fiscus, G. Sanders, D. Joy, M. Michel et al., Trecvid 2014?an overview of the goals, tasks, data, evaluation mechanisms and metrics, Proceedings of TRECVID, p.52, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01230444

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., ImageNet Large Scale Visual Recognition Challenge, International Journal of Computer Vision, vol.1010, issue.1, pp.211-252, 2015.
DOI : 10.1007/s11263-015-0816-y

T. Search and H. T. , Maria eskevich and robin aly and david n. racca and roeland ordelman and shu chen and gareth j.f. jones, Working Notes MediaEval Workshop, 2014.

A. Sharif-razavian, H. Azizpour, J. Sullivan, and S. Carlsson, Cnn features off-the-shelf: an astounding baseline for recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp.806-813, 2014.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, Proceedings of ICLR, 2015.

N. Srivastava and R. Salakhutdinov, Learning representations for multimodal data with deep belief nets, Intl. Conf. on Machine Learning, 2012.

T. Tommasi, T. Tuytelaars, and B. Caputo, A testbed for cross-dataset analysis. CoRR, abs/1402, 2014.

V. Vukoti´cvukoti´c, C. Raymond, and G. Gravier, Bidirectional joint representation learning with symmetrical deep neural networks for multimodal and crossmodal applications, Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, pp.343-346, 2016.

J. Weston, S. Bengio, and N. Usunier, Large scale image annotation: learning??to??rank with??joint word-image embeddings, Machine Learning, vol.5, issue.1, pp.21-35, 2010.
DOI : 10.1007/s10994-010-5198-3

G. Ye, Y. Li, H. Xu, D. Liu, and S. Chang, EventNet, Proceedings of the 23rd ACM international conference on Multimedia, MM '15, pp.471-480, 2015.
DOI : 10.1145/2733373.2806221