VQA: Visual Question Answering, 2015 IEEE International Conference on Computer Vision (ICCV), 2015. ,
DOI : 10.1109/ICCV.2015.279
URL : http://m-mitchell.com/papers/1505.00468v2.pdf
MUTAN: Multimodal Tucker Fusion for Visual Question Answering, 2017. ,
Words Jump-Start Vision: A Label Advantage in Object Recognition, Journal of Neuroscience, vol.35, issue.25, pp.9329-9335, 2015. ,
DOI : 10.1523/JNEUROSCI.5111-14.2015
URL : http://www.jneurosci.org/content/jneuro/35/25/9329.full.pdf
Learning Phrase Representations using RNN Encoder???Decoder for Statistical Machine Translation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014. ,
DOI : 10.3115/v1/D14-1179
URL : https://hal.archives-ouvertes.fr/hal-01433235
Visual Dialog, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. ,
DOI : 10.1109/CVPR.2017.121
GuessWhat?! Visual Object Discovery through Multi-modal Dialogue, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. ,
DOI : 10.1109/CVPR.2017.475
URL : https://hal.archives-ouvertes.fr/hal-01549641
A Learned Representation For Artistic Style, Proc. of ICLR, 2017. ,
Introduction to the special issue on language???vision interactions, Journal of Memory and Language, vol.57, issue.4, pp.455-459, 2007. ,
DOI : 10.1016/j.jml.2007.08.002
Multimodal Compact Bilinear Pooling for Visual Question Answering
and Visual Grounding, Proceedings of the 2016 Conference on Empirical Methods in Natural
Language Processing, 2016. ,
DOI : 10.18653/v1/D16-1044
Long Short-Term Memory, Neural computation, pp.1735-1780, 1997. ,
DOI : 10.1016/0893-6080(88)90007-X
Hierarchical question-image co-attention for visual question answering, Proc. of NIPS, 2016. ,
Deep residual learning for image recognition, Proc. of CVPR, 2016. ,
Multimodal residual learning for visual qa, Proc. of NIPS, 2016. ,
Hadamard product for low-rank bilinear pooling, Proc. of ICLR, 2017. ,
Prior Expectations Evoke Stimulus Templates in the Primary Visual Cortex, Journal of Cognitive Neuroscience, vol.17, issue.7, pp.1546-1554, 2014. ,
DOI : 10.1016/j.tics.2006.05.002
Microsoft COCO: Common Objects in Context, Proc of ECCV, 2014. ,
DOI : 10.1007/978-3-319-10602-1_48
URL : http://arxiv.org/pdf/1405.0312.pdf
Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images, 2015 IEEE International Conference on Computer Vision (ICCV), 2015. ,
DOI : 10.1109/ICCV.2015.9
Ask Your Neurons: A Deep Learning Approach to Visual Question Answering, International Journal of Computer Vision, vol.1, issue.2, 2016. ,
DOI : 10.1109/ICCV.2013.211
Glove: Global Vectors for Word Representation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014. ,
DOI : 10.3115/v1/D14-1162
Exploring models and data for image question answering, Proc. of NIPS, 2015. ,
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, Proc. of ICML, 2015. ,
Very deep convolutional networks for large-scale image recognition, 2015. ,
Making the V in VQA matter: Elevating the role of image understanding in Visual Question Answering, Proc. of CVPR, 2017. ,
Unconscious effects of language-specific terminology on preattentive color perception, Proceedings of the National Academy of Sciences, vol.6, issue.5-6, pp.4567-4570, 2009. ,
DOI : 10.1002/(SICI)1097-0193(1998)6:5/6<383::AID-HBM10>3.0.CO;2-Z
Visualizing data using t-sne, JMLR, vol.9, pp.2579-2605, 2008. ,
Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering, Proc. of ECCV, 2015. ,
DOI : 10.1007/978-3-642-33715-4_54
URL : http://arxiv.org/pdf/1511.05234
Show, attend and tell: Neural image caption generation with visual attention, Proc. of ICML, 2015. ,
Stacked attention networks for image question answering (a) Feature map projection from MODERN (Stage4) (b) Feature map projection from MODERN (Stage3) (c) Feature map projection from MODERN, Proc. of CVPR, 2016. ,