E. Agirre, E. Alfonseca, K. Hall, and J. Kravalova, A study on similarity and relatedness using distributional and wordnet-based approaches, NAACL HLT, pp.19-27, 2009.

M. Baroni, G. Dinu, and G. Kruszewski, Don't count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors, ACL, pp.238-247, 2014.

S. Bird, E. Klein, and E. Loper, Natural language processing with Python: analyzing text with the natural language toolkit, 2009.

A. David-m-blei, J. D. Kucukelbir, and . Mcauliffe, Variational inference: A review for statisticians, Journal of the American Statistical Association, vol.112, issue.518, pp.859-877, 2017.

E. Bruni, N. Tran, and M. Baroni, Multimodal distributional semantics, Journal of Artificial Intelligence Ressearch, vol.49, pp.1-47, 2014.

M. Brysbaert, A. B. Warriner, and V. Kuperman, Concreteness ratings for 40 thousand generally known english word lemmas. Behavior research methods, vol.46, pp.904-911, 2014.

I. Calixto and Q. Liu, Incorporating global visual features into attention-based neural machine translation, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, (EMNLP), pp.992-1003, 2017.

G. Collell, T. Zhang, and M. Moens, Imagined Visual Representations as Multimodal Embeddings, AAAI, pp.4378-4384, 2017.

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., Imagenet: A large-scale hierarchical image database, CVPR, pp.248-255, 2009.

F. Faghri, J. David, J. R. Fleet, S. Kiros, and . Fidler, VSE++: Improving VisualSemantic Embeddings with Hard Negatives, 2017.

C. Fellbaum, , 1998.

Y. Feng and M. Lapata, Visual information in semantic representation, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp.91-99, 2010.

L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan et al., Placing search in context: The concept revisited, WWW, pp.406-414, 2001.

G. Halawi, G. Dror, E. Gabrilovich, and Y. Koren, Large-scale learning of word relatedness with constraints, SIGKDD, pp.1406-1414, 2012.

S. Zellig and . Harris, Distributional structure. Word, vol.10, pp.146-162, 1954.

F. Hill, R. Reichart, and A. Korhonen, Simlex-999: Evaluating semantic models with (genuine) similarity estimation, Computational Linguistics, vol.41, issue.4, pp.665-695, 2015.

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long et al., Caffe: Convolutional architecture for fast feature embedding, ACM Multimedia, pp.675-678, 2014.

A. Karpathy and L. Fei-fei, Deep visualsemantic alignments for generating image descriptions, CVPR, pp.3128-3137, 2015.

D. Kiela and L. Bottou, Learning Image Embeddings using Convolutional Neural Networks for Improved Multi-Modal Semantics, EMNLP, pp.36-45, 2014.
DOI : 10.3115/v1/d14-1005
URL : https://doi.org/10.3115/v1/d14-1005

P. Diederik, J. Kingma, and . Ba, Adam: A method for stochastic optimization, 2014.

P. Diederik, M. Kingma, and . Welling, Autoencoding variational bayes, 2013.

R. Kiros, R. Salakhutdinov, and R. Zemel, Unifying visual-semantic embeddings with multimodal neural language models, 2014.

B. Klein, G. Lev, G. Sadeh, and L. Wolf, Associating neural word embeddings with deep image representations using fisher vectors, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4437-4446, 2015.
DOI : 10.1109/cvpr.2015.7299073

A. Lazaridou and M. Baroni, Combining Language and Vision with a Multimodal Skip-gram Model, NAACL HLT, pp.153-163, 2015.
DOI : 10.3115/v1/n15-1016
URL : https://doi.org/10.3115/v1/n15-1016

O. Levy and Y. Goldberg, Neural word embedding as implicit matrix factorization, NIPS, pp.2177-2185, 2014.

O. Levy, Y. Goldberg, and I. Dagan, Improving distributional similarity with lessons learned from word embeddings, Transactions of the Association for Computational Linguistics, vol.3, pp.211-225, 2015.
DOI : 10.1162/tacl_a_00134
URL : https://doi.org/10.1162/tacl_a_00134

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality, NIPS, pp.3111-3119, 2013.

A. George, . Miller, . Walter, and . Charles, Contextual correlates of semantic similarity, Language and cognitive processes, vol.6, issue.1, pp.1-28, 1991.

N. Mostafazadeh, I. Misra, J. Devlin, M. Mitchell, X. He et al., Generating natural questions about an image, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), 2016.
DOI : 10.18653/v1/p16-1170
URL : https://doi.org/10.18653/v1/p16-1170

J. Pennington, R. Socher, and C. Manning, Glove: Global vectors for word representation, EMNLP, pp.1532-1543, 2014.
DOI : 10.3115/v1/d14-1162
URL : https://doi.org/10.3115/v1/d14-1162

M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark et al., Deep contextualized word representations, Proc. of NAACL, 2018.
DOI : 10.18653/v1/n18-1202
URL : https://doi.org/10.18653/v1/n18-1202

K. Radinsky, E. Agichtein, E. Gabrilovich, and S. Markovitch, A word at a time: computing word relatedness using temporal semantic analysis, WWW, pp.337-346, 2011.

H. Rubenstein and J. Goodenough, Contextual correlates of synonymy, Communications of the ACM, vol.8, issue.10, pp.627-633, 1965.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., ImageNet Large Scale Visual Recognition Challenge, vol.115, pp.211-252, 2015.
DOI : 10.1007/s11263-015-0816-y
URL : http://arxiv.org/pdf/1409.0575

C. Silberer, V. Ferrari, and M. Lapata, Visually grounded meaning representations. IEEE transactions on pattern analysis and machine intelligence, vol.39, pp.2284-2297, 2017.
DOI : 10.1109/tpami.2016.2635138
URL : https://www.pure.ed.ac.uk/ws/files/29421203/submission2_2_1.pdf

C. Silberer and M. Lapata, Learning Grounded Meaning Representations with Autoencoders, ACL, pp.721-732, 2014.
DOI : 10.3115/v1/p14-1068
URL : https://doi.org/10.3115/v1/p14-1068

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2014.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed et al., Going deeper with convolutions, CVPR, pp.1-9, 2015.
DOI : 10.1109/cvpr.2015.7298594
URL : http://arxiv.org/pdf/1409.4842

P. Young, A. Lai, M. Hodosh, and J. Hockenmaier, From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions, Transactions of the Association for Computational Linguistics, vol.2, pp.67-78, 2014.