Analyzing the Behavior of Visual Question Answering Models, EMNLP, p.36, 2016. ,
, A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset, p.37, 2017.
, Novel Object Captioning at Scale, p.37, 2018.
Unsupervised learning from Narrated Instruction Videos, CVPR, p.54, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01171193
Measuring the Objectness of Image Windows, IEEE Transactions on Pattern Analysis and Machine Intelligence, p.21, 2012. ,
SPICE: Semantic Propositional Image Caption Evaluation, ECCV, vol.30, p.36, 2016. ,
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering, CVPR, vol.33, p.52, 2018. ,
Deep Compositional Question Answering with Neural Module Networks, CVPR, vol.35, p.67, 2016. ,
Learning to Compose Neural Networks for Question Answering, HLT-NAACL, p.35, 2016. ,
Deep Canonical Correlation Analysis, ICML, p.31, 2013. ,
, Visual Question Answering. In ICCV, vol.26, p.27, 2015.
Learning to generalize to new compositions in image understanding, p.95, 2016. ,
Model Transfer for Object Category Detection, ICCV, p.96, 2011. ,
Diffrac: a discriminative and flexible framework for clustering, NIPS, vol.16, p.73, 2007. ,
Neural Machine Translation by Jointly Learning to Align and Translate, ICLR, p.31, 2015. ,
METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments, Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, p.36, 2005. ,
Relnet: End-to-end modeling of entities & relations, p.94, 2017. ,
Patchtable: efficient patch queries for large datasets and applications, ACM Trans. Graph, p.62, 2015. ,
Interaction networks for learning about objects, relations and physics, NIPS, vol.35, p.94, 2016. ,
, Multimodal Tucker Fusion for Visual Question Answering. In ICCV, p.31, 2017.
URL : https://hal.archives-ouvertes.fr/hal-02073637
, Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection. In AAAI, p.31, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02073644
Stylizing animation by example, ACM Trans. Graph, p.62, 2013. ,
Weakly Supervised Deep Detection Networks, CVPR, vol.53, p.68, 2016. ,
Weakly supervised object detection with convex clustering, CVPR, p.53, 2015. ,
Actions as space-time shapes, ICCV, p.39, 2005. ,
Finding actors and actions in movies, ICCV, p.54, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00904991
Weakly supervised action labeling in videos under ordering constraints, ECCV, vol.54, p.69, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01053967
Weakly-supervised alignment of video with text, ICCV, vol.54, p.87, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01154523
A Semantic Matching Energy Function for Learning with Multi-Relational Data, Machine Learning, vol.49, p.50, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00835282
On the Biological Plausibility of Grandmother Cells: Implications for Neural Network Theories in Psychology and Neuroscience, Psychological Review, issue.7, 2009. ,
MUREL: Multimodal Relational Reasoning for Visual Question Answering, CVPR, p.35, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02073649
Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields, CVPR, p.42, 2017. ,
Toward an Architecture for Never-Ending Language Learning, AAAI, p.46, 2010. ,
Deep Clustering for Unsupervised Learning of Visual Features, In ECCV, vol.87, 2018. ,
Text to 3D Scene Generation with Rich Lexical Grounding, ACL, vol.35, p.67, 2015. ,
Learning Spatial Knowledge for Text to 3D Scene Generation, EMNLP, p.35, 2014. ,
HICO: A benchmark for recognizing human-object interactions in images, ICCV, vol.40, p.106, 2015. ,
Learning to Detect Human-Object Interactions, WACV, vol.40, p.110, 2018. ,
Webly Supervised Learning of Convolutional Networks, ICCV, p.54, 2015. ,
Extracting Visual Knowledge from Web Data, ICCV, vol.55, p.67, 2013. ,
Consistent Image Analogies using Semi-supervised Learning, CVPR, p.62, 2008. ,
Pose-based CNN Features for Action Recognition, ICCV, p.131, 2015. ,
A flexible model for training action localization with varying levels of supervision, NIPS, p.137, 2018. ,
Unsupervised Object Discovery and Localization in the Wild: Part-based Matching with Bottom-up Region Proposals, CVPR, p.55, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01110036
Context models and out-of-context objects, Pattern Recognition Letters, vol.68, p.69, 2012. ,
Resolving Language and Vision Ambiguities Together: Joint Segmentation & Prepositional Attachment Resolution in Captioned Scenes, EMNLP, p.139, 2016. ,
Weakly Supervised Object Localization with Multi-Fold Multiple Instance Learning, IEEE Transactions ,
URL : https://hal.archives-ouvertes.fr/hal-01123482
Support-Vector Networks, Machine Learning, 1921. ,
Detecting Visual Relationships with Deep Relational Networks, CVPR, vol.42, p.95, 2017. ,
Histograms of oriented gradients for human detection, CVPR, 1921. ,
URL : https://hal.archives-ouvertes.fr/inria-00548512
Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions? In CVIU, p.37, 2017. ,
, Visual Dialog. In CVPR, vol.27, p.28, 2017.
Learning person-object interactions for action recognition in still images, NIPS, vol.67, p.131, 1941. ,
URL : https://hal.archives-ouvertes.fr/hal-00648156
Recognizing human actions in still images: a study of bag-of-features and part-based representations, BMVC, vol.44, p.45, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-01060885
ImageNet: A large-scale hierarchical image database, CVPR, vol.42, p.54, 2009. ,
Large-Scale Object Classification Using Label Relation Graphs, ECCV, vol.57, p.124, 2014. ,
Detecting Actions, Poses, and Objects with Relational Phraselets, ECCV, p.41, 2012. ,
Discriminative models for static humanobject interactions, SMiCV) CVPR Workshops, vol.44, p.67, 2010. ,
Discriminative Models for Multi-Class Object Layout, In International Journal of Computer Vision, issue.23, 2011. ,
Solving the Multiple Instance Problem with Axis-Parallel Rectangles, Artificial Intelligence, p.53, 1997. ,
How to make words with vectors: Phrase generation in distributional semantics, ACL, p.51, 2014. ,
Learning Everything about Anything: Webly-Supervised Visual Concept Learning, CVPR, vol.50, p.95, 2014. ,
Scalable fact learning in images, AAAI, vol.49, p.68, 2016. ,
Image Description using Visual Dependency Representations, EMNLP, p.31, 2013. ,
Finding Beans in Burgers: Deep Semantic-Visual Embedding with Localization, CVPR, p.37, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-02171857
VSE++: Improving Visual-Semantic Embeddings with Hard Negatives, BMVC, p.31, 2017. ,
Clustering by Composition" -Unsupervised Discovery of Image Categories, IEEE Transactions on Pattern Analysis and Machine Intelligence, p.55, 2012. ,
From Captions to Visual Concepts and Back, CVPR, vol.32, p.68, 2015. ,
Attribute-Centric Recognition for Crosscategory Generalization, p.56 ,
Describing Objects by their Attributes, CVPR, vol.56, p.131, 2009. ,
Every picture tells a story: Generating sentences from images, ECCV, p.26, 2010. ,
Object Detection with Discriminatively Trained Part Based Models, vol.21 ,
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting, Journal of Computer and System Sciences, p.21, 1997. ,
DeViSE: A Deep Visual-Semantic Embedding Model, NIPS, vol.59, p.69, 2013. ,
, Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding, p.31, 2016.
Object Categorization using Cooccurrence, Location and Appearance, CVPR, vol.23, p.68, 2008. ,
ICAN: Instance-Centric Attention Network for Human-Object Interaction Detection, BMVC, vol.42, p.116, 2018. ,
Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question, NIPS, p.27, 2015. ,
Visual turing test for computer vision systems, Proceedings of the National Academy of Sciences of the United States of America, p.27, 2015. ,
Structure Mapping: A Theoretical Framework for Analogy, Cognitive Science, p.60, 1983. ,
Analogical Reasoning, Psychology Of, vol.60, p.61, 2003. ,
Analogical Reasoning. Encyclopedia of Human Behavior, p.61, 2012. ,
, ICCV, vol.75, p.82, 2015.
Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR, vol.22, p.77, 2014. ,
, , p.108, 2018.
Contextual Action Recognition with R*CNN, ICCV, p.53, 2015. ,
Detecting and Recognizing Human-Object Interactions, vol.42, p.116, 2018. ,
A multi-view embedding space for modeling internet images, tags, and their semantics, p.31, 2014. ,
, Generative Adversarial Nets. In NIPS, p.137, 2014.
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering, CVPR, p.37, 2017. ,
Scene graph generation with external knowledge and image reconstruction, CVPR, p.47, 2019. ,
DensePose: Dense Human Pose Estimation in the Wild, In CVPR, p.42, 2018. ,
A survey on still image based human action recognition, Pattern Recognition, p.39, 2014. ,
Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers, ECCV, p.67, 2008. ,
Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, p.67, 2009. ,
, Visual Role Semantic Labeling, vol.40, p.110, 2015.
Canonical Correlation Analysis: An Overview with Application to Learning Methods, Neural Computation, p.31, 2004. ,
Deconstructing Visual Scenes in Cortex: Gradients of Object and Spatial Layout Information, Cerebral cortex, issue.7, 2013. ,
Combining efficient object localization and image classification, ICCV, p.21, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00439516
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, p.22, 2014. ,
, ICCV, vol.42, p.43, 2017.
Generating Visual Explanations, ECCV, p.37, 2016. ,
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data, CVPR, vol.37, p.69, 2016. ,
Women Also Snowboard: Overcoming Bias in Captioning Models, In ECCV, p.36, 2018. ,
Image analogies, SIGGRAPH, p.61, 2001. ,
Discriminative Learning of Open-Vocabulary Object Retrieval and Localization by Negative Phrase Augmentation, EMNLP, p.136, 2018. ,
Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics, In Journal of Artificial Intelligence Research, issue.26, 2013. ,
Putting Objects in Perspective, CVPR, vol.43, p.132, 2006. ,
The Pragmatics of Analogical Transfer. The Psychology of Learning and Motivation, p.60, 1985. ,
Relation Networks for Object Detection, In CVPR, p.24, 2018. ,
, Binary Image Selection (BISON): Interpretable Evaluation of Visual Grounding, p.37, 2019.
Recognising Human-Object Interaction via Exemplar Based Modelling, ICCV, p.48, 2013. ,
Natural Language Object Retrieval, CVPR, vol.29, p.67, 2016. ,
Modeling Relationships in Referential Expressions with Compositional Modular Networks, CVPR, vol.34, p.135, 2017. ,
GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering, CVPR, p.37, 2019. ,
Union Visual Translation Embedding for Visual Relationship Detection and Scene Graph Generation, p.50, 2019. ,
, Robust Visual Relationship Learning, vol.50, p.96, 2018.
Analogy-preserving Semantic Embedding for Visual Object Categorization, ICML, p.62, 2013. ,
Recognizing actions from still images, ICPR, vol.39, p.41, 2008. ,
A Model of Saliency-Based Visual Attention for Rapid Scene Analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, p.32, 1998. ,
Segment-phrase table for semantic segmentation, visual entailment and paraphrasing, ICCV, p.95, 2015. ,
Revisiting Visual Question Answering Baselines, ECCV, p.36, 2016. ,
A latent factor model for highly multi-relational data, NIPS, vol.49, p.94, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00776335
Image Retrieval using Scene Graphs, CVPR, vol.68, p.95, 2015. ,
DenseCap: Fully Convolutional Localization Networks for Dense Captioning, CVPR, vol.67, p.142, 2016. ,
Clevr: A diagnostic dataset for compositional language and elementary visual reasoning, CVPR, p.35, 2017. ,
Inferring and Executing Programs for Visual Reasoning, ICCV, vol.35, p.36, 2017. ,
Image Generation from Scene Graphs, CVPR, vol.30, p.137, 2018. ,
Discriminative clustering for image cosegmentation, CVPR, p.54, 2010. ,
Efficient image and video co-localization with frank-wolfe algorithm, ECCV, p.69, 2014. ,
Object grouping based on real-world regularities facilitates perception by reducing competitive interactions in visual cortex, Proceedings of the National Academy of Sciences of the United States of America, 2014. ,
Joint Learning of Object and Action Detectors, ICCV, p.137, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01575804
ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization, ECCV, p.53, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01421772
Deep Visual-Semantic Alignments for Generating Image Descriptions, CVPR, vol.32, p.95, 2015. ,
Deep Fragment Embeddings for Bidirectional Image Sentence Mapping, NIPS, vol.95, p.99, 2014. ,
Compositional Learning for Human Object Interaction, ECCV, vol.58, p.137, 2018. ,
Referring to Objects in Photographs of Natural Scenes, EMNLP, vol.29, p.67, 2014. ,
Multimodal Residual Learning for Visual QA, NIPS, p.31, 2016. ,
Where Do Objects Become Scenes? Cerebral Cortex, 2010. ,
A Method for Stochastic Optimization, ICLR, p.109, 2015. ,
Semi-supervised classification with graph convolutional networks, ICLR, p.94, 2016. ,
Unifying visual-semantic embeddings with multimodal neural language models, p.31, 2014. ,
Fisher Vectors Derived from Hybrid Gaussian-Laplacian Mixture Models for Image Annotation, CVPR, p.31, 2015. ,
Detecting Visual Relationships Using Box Attention, p.25, 2018. ,
What Are You Talking About? Text-to-Image Coreference, CVPR, vol.35, p.139, 2014. ,
Visual Coreference Resolution in Visual Dialog using Neural Module Networks, ECCV, p.139, 2018. ,
Fine-grained recognition without part annotations, CVPR, p.131, 2015. ,
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations, International Journal of Computer Vision, vol.83, p.141, 2016. ,
ImageNet Classification with Deep Convolutional Neural Networks, NIPS, p.22, 2012. ,
Understanding and generating simple image descriptions, CVPR, vol.26, p.31, 2011. ,
The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale, vol.41, p.137, 2018. ,
Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer, CVPR, vol.56, p.57, 2009. ,
From Subcategories to Visual Composites: A Multi-Level Framework for Object Detection, 2013. ,
Learning realistic human actions from movies, CVPR, p.39, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00548659
Is this a wampimuk? Cross-modal mapping between distributional semantics and the visual world, ACL, vol.59, p.69, 2014. ,
Stacked Cross Attention for Image-Text Matching, ECCV, vol.33, p.134, 2018. ,
A conceptual theory of question answering. International Joint Conferences on Artificial Intelligence Organization, p.27, 1977. ,
Extracting Adaptive Contextual Cues from Unlabeled Regions, ICCV, 2011. ,
Automatic Discovery of Groups of Objects for Scene Understanding, CVPR, vol.23, p.68, 2012. ,
Composing Simple Image Descriptions using Web-scale N-grams, CoNLL, p.31, 2011. ,
ViP-CNN: A visual Phrase Reasoning Convolutional Neural Network for Visual Relationship Detection, CVPR, vol.42, p.95, 2017. ,
Scene Graph Generation from Objects, Phrases and Region Captions, ICCV, p.45, 2017. ,
Deep Variation-Structured Reinforcement Learning for Visual Relationship and Attribute Detection, CVPR, p.45, 2017. ,
Visual Attribute Transfer through Deep Image Analogy, ACM Transactions on Graphics, p.62, 2017. ,
, Natural Language Guided Visual Relationship Detection, p.46, 2017.
Visual Semantic Search: Retrieving Videos via Complex Textual Queries, CVPR, p.35, 2014. ,
Common objects in context, ECCV, vol.75, p.107, 2014. ,
Feature Pyramid Networks for Object Detection, CVPR, vol.22, p.108, 2017. ,
A Structured Self-Attentive Sentence Embedding, ICLR, p.135, 2017. ,
iVQA: Inverse Visual Question Answering, CVPR, p.37, 2018. ,
, Context
URL : https://hal.archives-ouvertes.fr/hal-00962015
Single Shot MultiBox Detector, ECCV, p.22, 2016. ,
Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, 1921. ,
Visual Relationship Detection with Language Priors, ECCV, vol.142, p.143, 2016. ,
Hierarchical Question-Image Co-Attention for Visual Question Answering, NIPS, p.32, 2016. ,
Neural Baby Talk, CVPR, vol.33, p.52, 2018. ,
Learning Visual Relation Facts with Semantic Attention for Visual Question Answering, KDD, p.35, 2018. ,
Action Recognition from a Distributed Representation of Pose and Appearance, CVPR, vol.43, p.131, 2011. ,
Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships, NIPS, p.42, 2009. ,
Learning models for actions and person-object interactions with transfer to question answering, ECCV, p.52, 2016. ,
The Stanford CoreNLP Natural Language Processing Toolkit, ACL, vol.51, p.134, 2014. ,
Generation and comprehension of unambiguous object descriptions, CVPR, vol.29, p.67, 2016. ,
, A Visual Question Answering Benchmark Requiring External Knowledge, p.28, 2019.
Vision: A Computational Investigation into the Human Representation and Processing of Visual Information, p.25, 1982. ,
, J, vol.54, p.75
Distributed Representations of Words and Phrases and Their Compositionality, NIPS, vol.46, p.109, 2013. ,
WORDNET: A Lexical Database for English, Communications of the ACM, p.57, 1992. ,
From Red Wine to Red Tomato: Composition with Context, CVPR, vol.50, p.96, 2017. ,
Vector-based models of semantic composition, ACL, p.51, 2008. ,
Generating Image Descriptions From Computer Vision Detections, EACL, p.31, 2012. ,
Jointly Learning a Knowledge Base of Hierarchy, Relations, and Facts, ACL, vol.46, p.67, 2015. ,
Modeling Context Between Objects for Referring Expression Understanding, ECCV, vol.34, p.53, 2016. ,
Pixels to Graphs by Associative Embedding, NIPS, p.45, 2017. ,
A Review of Relational Machine Learning for Knowledge Graphs, Proceedings of the IEEE, p.49, 2015. ,
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, BMVC, vol.39, p.55, 2006. ,
Learning Conditioned Graph Structures for Interpretable Visual Question Answering, NIPS, p.35, 2018. ,
Visual Scene Perception. Encyclopaedia of Perception, p.25, 2009. ,
Is object localization for free? -Weakly-supervised learning with convolutional neural networks, CVPR, vol.53, p.68, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01015140
Im2Text: Describing Images Using 1 Million Captioned Photographs, NIPS, p.26, 2011. ,
Minding the Gaps for Block Frank-Wolfe Optimization of Structured SVMs, ICML, vol.54, p.75, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01323727
Scene Recognition and Weakly Supervised Object Localization with Deformable Part-Based Models, ICCV, p.131, 2011. ,
Multimodal explanations: Justifying decisions and pointing to the evidence, CVPR, p.37, 2018. ,
Weakly-Supervised Learning of Visual Relations, ICCV, vol.108, p.113, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01576035
Detecting Unseen Visual Relations Using Analogies, ICCV, p.17, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-01975760
Visual Relationship Detection Based on Guided Proposals and Semantic Knowledge Distillation ,
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models, ICCV, vol.29, p.95, 2015. ,
Phrase Localization and Visual Relationship Detection with Comprehensive Linguistic Cues, vol.29, p.95, 2017. ,
, Open-vocabulary Phrase Detection, p.136, 2019.
Weakly supervised learning of interactions between humans and objects, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.43, p.67, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00516477
Learning Human-Object Interactions by Graph Parsing Neural Networks, ECCV, vol.42, p.110, 2018. ,
Joint pose estimation and action recognition in image graphs, ICIP, p.41, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-01063329
Learning Semantic Relationships for Better Action Retrieval in Images, CVPR, vol.57, p.124, 2015. ,
YOLO9000: Better, Faster, Stronger, CVPR, 2017. ,
You Only Look Once: Unified, Real-Time Object Detection, CVPR, p.22, 2016. ,
Deep Visual Analogy-Making, NIPS, vol.17, p.96, 2015. ,
Learning What and Where to Draw, NIPS, p.137, 2016. ,
Generative Adversarial Text to Image Synthesis, ICML, p.137, 2016. ,
Exploring Models and Data for Image Question Answering, NIPS, p.27, 2015. ,
Towards real-time object detection with region proposal networks, NIPS, vol.65, p.108, 2015. ,
Grounding of textual phrases in images by reconstruction, vol.33, p.67, 2016. ,
Object Hallucination in Image Captioning, In EMNLP, vol.36, p.37, 2018. ,
Describing Common Human Visual Actions in Images, BMVC, vol.40, p.107, 2015. ,
Learn How to Choose: Independent Detectors Versus Composite Visual Phrases, p.49, 2017. ,
Distinguishing types of superficial similarities: Different effects on the access and use of earlier problems, Journal of Experimental Psychology: Learning, Memory, and Cognition, p.60, 1989. ,
Dynamic Routing Between Capsules, NIPS, p.131, 2017. ,
VisKE: Visual Knowledge Extraction and Question Answering by Visual Verification of Relation Phrases, CVPR, vol.49, p.95, 2015. ,
Answering Visual Analogy Questions, NIPS, vol.62, p.96, 2015. ,
Recognition using visual phrases, CVPR, vol.141, p.142, 1995. ,
A Simple Neural Network Module for Relational Reasoning, vol.35, p.94, 2017. ,
The Graph Neural Network Model, IEEE Transactions on Neural Networks, p.35, 2009. ,
Recognizing human actions: a local SVM approach, ICPR, p.39, 2004. ,
Bidirectional recurrent neural networks, Signal Processing, p.32, 1997. ,
Generating semantically precise scene graphs from textual descriptions for improved image retrieval, VL@EMNLP, p.52, 2015. ,
Video Visual Relation Detection, ACM International Conference on Multimedia, p.137, 2017. ,
Expanded Parts Model for Human Attribute and Action Recognition in Still Images, CVPR, p.131, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00816144
Scaling Human-Object Interaction Recognition through Zero-Shot Learning, WACV, vol.42, p.95, 2018. ,
Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks, ICCV, p.131, 2015. ,
Unsupervised Discovery of Mid-Level Discriminative Patches, ECCV, 2012. ,
Discovering object categories in image collections, ICCV, p.55, 2005. ,
Reasoning with neural tensor networks for knowledge base completion, NIPS, vol.49, p.67, 2013. ,
Zero-shot learning through cross-modal transfer, NIPS, vol.59, p.69, 2013. ,
One-bit object detection: On learning to localize objects with minimal supervision, ICML, p.52, 2014. ,
ConceptNet 5: A Large Semantic Network for Relational Knowledge, p.47, 2013. ,
Natural Scene Statistics Account for the Representation of Scene Categories in Human Visual Cortex, Neuron, issue.7, 2013. ,
Interobject grouping facilitates visual awareness, Journal of Vision, issue.6, 2015. ,
Yago: a core of semantic knowledge, WWW, p.46, 2007. ,
URL : https://hal.archives-ouvertes.fr/hal-01472497
Modelling Relational Data using Bayesian Clustered Tensor Factorization, NIPS, p.49, 2009. ,
Graph-Structured Representations for Visual Question Answering, CVPR, vol.30, p.35, 2017. ,
Pose primitive based human action recognition in videos or still images, CVPR, p.41, 2008. ,
Selective Search for Object Recognition, International Journal of Computer Vision, vol.21, p.22, 2013. ,
Visualizing Data using t-SNE, Journal of Machine Learning Research, vol.122, p.128, 2008. ,
CIDEr: Consensus-based image description evaluation, CVPR, p.36, 2015. ,
Context-Aware Captions from Context-Agnostic Supervision, CVPR, p.34, 2017. ,
Order-embeddings of images and language, ICLR, p.58, 2016. ,
Captioning Images with Diverse Objects, CVPR, vol.37, p.69, 2017. ,
Learning Deep Structure-Preserving Image-Text Embeddings, CVPR, vol.98, p.99, 1995. ,
Structured Matching for Phrase Localization, ECCV, p.34, 2016. ,
Explicit knowledgebased reasoning for visual question answering, IJCAI, p.28, 2017. ,
FVQA: Fact-Based Visual Question Answering, IEEE Transactions on Pattern Analysis and Machine Intelligence, p.28, 2018. ,
Designing deep networks for surface normal estimation, CVPR, p.43, 2015. ,
Unsupervised Discovery of Action Classes, CVPR, p.55, 2006. ,
LinkNet: Relational Embedding for Scene Graph, NIPS, p.45, 2018. ,
Ask Me Anything: Free-Form Visual Question Answering Based on Knowledge from External Sources, CVPR, p.28, 2016. ,
Visual Question Answering: A Survey of Methods and Datasets. Computer Vision and Image Understanding, p.33, 2017. ,
Latent Embeddings for Zero-shot Classification, CVPR, p.69, 2016. ,
Weakly-Supervised Visual Grounding of Phrases with Linguistic Structures, CVPR, p.34, 2017. ,
, Visual Entailment Task for Visually-Grounded Language Learning, p.37, 2018.
Scene Graph Generation by Iterative Message Passing, CVPR, vol.30, p.46, 2017. ,
Attend and Tell: Neural Image Caption Generation with Visual Attention, ICML, p.32, 2015. ,
Maximum Margin Clustering, NIPS, p.54, 2004. ,
Deep correlation for matching images and text, CVPR, p.31, 2015. ,
Graph r-cnn for scene graph generation, ECCV, vol.24, p.45, 2018. ,
Unsupervised Template Learning for Fine-Grained Object Recognition, NIPS, p.131, 2012. ,
Shuffle-Then-Assemble: Learning Object-Agnostic Visual Relationship Features, ECCV, p.42, 2018. ,
Stacked Attention Networks for Image Question Answering, CVPR, p.32, 2016. ,
Grouplet: A structured image representation for recognizing human and object interactions, CVPR, vol.41, p.67, 2010. ,
Modeling mutual context of object and human pose in humanobject interaction activities, CVPR, vol.39, p.131, 2010. ,
Action Recognition with Exemplar Based 2.5D Graph Matching, ECCV, p.41, 2012. ,
Human Action Recognition by Learning Bases of Action Attributes and Parts, ICCV, p.67, 2011. ,
Exploring Visual Relationship for Image Captioning, ECCV, vol.30, p.35, 2018. ,
Stating the Obvious: Extracting Visual Common Sense Knowledge, NAACL, p.67, 2016. ,
Zoom-Net: Mining Deep Feature Interactions for Visual Relationship Recognition, ECCV, p.46, 2018. ,
Modeling Context in Referring Expressions, ECCV, vol.34, p.53, 2016. ,
MAttNet: Modular Attention Network for Referring Expression Comprehension, CVPR, p.35, 2018. ,
Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation, ICCV, vol.46, p.95, 2017. ,
A MultiPath Network for Object Detection, BMVC, p.65, 2016. ,
Neural Motifs: Scene Graph Parsing with Global Context, CVPR, p.42, 2018. ,
From Recognition to Cognition: Visual Commonsense Reasoning, CVPR, p.37, 2019. ,
Visual Translation Embedding Network for Visual Relation Detection, CVPR, vol.43, p.96, 2017. ,
, Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN. In ICCV, p.53, 2017.
Relationship Proposal Networks, CVPR, vol.24, p.132, 2017. ,
Large-Scale Visual Relationship Understanding, AAAI, vol.40, p.96, 2019. ,
Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks, ICCV, vol.43, p.138, 2017. ,
Reasoning about object affordances in a knowledge base representation, ECCV, vol.47, p.67, 1928. ,
Visual7W: Grounded Question Answering in Images, CVPR, vol.27, p.32, 2016. ,
Towards Context-aware Interaction Recognition for Visual Relationship Detection, ICCV, vol.43, p.95, 2017. ,
HCVRD: a benchmark for large-scale Human-Centered Visual Relationship Detection, In AAAI, vol.13, p.40, 2018. ,
Edge Boxes: Locating Object Proposals from Edges, ECCV, 1921. ,
Learning the visual interpretation of sentences, ICCV, p.35, 2013. ,