Related papers: Grounding Visual Explanations

Grounding Visual Explanations (Extended Abstract)

Existing models which generate textual explanations enforce task relevance through a discriminative term loss function, but such mechanisms only weakly constrain mentioned object parts to actually be present in the image. In this paper, a…

Computer Vision and Pattern Recognition · Computer Science 2017-11-20 Lisa Anne Hendricks , Ronghang Hu , Trevor Darrell , Zeynep Akata

Object-Centric Diagnosis of Visual Reasoning

When answering questions about an image, it not only needs knowing what -- understanding the fine-grained contents (e.g., objects, relationships) in the image, but also telling why -- reasoning over grounding visual cues to derive the…

Computer Vision and Pattern Recognition · Computer Science 2020-12-22 Jianwei Yang , Jiayuan Mao , Jiajun Wu , Devi Parikh , David D. Cox , Joshua B. Tenenbaum , Chuang Gan

Extending Phrase Grounding with Pronouns in Visual Dialogues

Conventional phrase grounding aims to localize noun phrases mentioned in a given caption to their corresponding image regions, which has achieved great success recently. Apparently, sole noun phrase grounding is not enough for cross-modal…

Computation and Language · Computer Science 2022-10-25 Panzhong Lu , Xin Zhang , Meishan Zhang , Min Zhang

Generating Visual Explanations

Clearly explaining a rationale for a classification decision to an end-user can be as important as the decision itself. Existing approaches for deep visual recognition are generally opaque and do not output any justification text;…

Computer Vision and Pattern Recognition · Computer Science 2016-03-29 Lisa Anne Hendricks , Zeynep Akata , Marcus Rohrbach , Jeff Donahue , Bernt Schiele , Trevor Darrell

REX: Reasoning-aware and Grounded Explanation

Effectiveness and interpretability are two essential properties for trustworthy AI systems. Most recent studies in visual reasoning are dedicated to improving the accuracy of predicted answers, and less attention is paid to explaining the…

Computer Vision and Pattern Recognition · Computer Science 2022-03-14 Shi Chen , Qi Zhao

Be Careful When Evaluating Explanations Regarding Ground Truth

Evaluating explanations of image classifiers regarding ground truth, e.g. segmentation masks defined by human perception, primarily evaluates the quality of the models under consideration rather than the explanation methods themselves.…

Computer Vision and Pattern Recognition · Computer Science 2023-11-09 Hubert Baniecki , Maciej Chrabaszcz , Andreas Holzinger , Bastian Pfeifer , Anna Saranti , Przemyslaw Biecek

Grounded Semantic Composition for Visual Scenes

We present a visually-grounded language understanding model based on a study of how people verbally describe objects in scenes. The emphasis of the model is on the combination of individual word meanings to produce meanings for complex…

Artificial Intelligence · Computer Science 2011-07-04 P. Gorniak , D. Roy

Visual correspondence-based explanations improve AI robustness and human-AI team accuracy

Explaining artificial intelligence (AI) predictions is increasingly important and even imperative in many high-stakes applications where humans are the ultimate decision-makers. In this work, we propose two novel architectures of…

Computer Vision and Pattern Recognition · Computer Science 2023-09-01 Giang Nguyen , Mohammad Reza Taesiri , Anh Nguyen

Learning Cross-modal Context Graph for Visual Grounding

Visual grounding is a ubiquitous building block in many vision-language tasks and yet remains challenging due to large variations in visual and linguistic features of grounding entities, strong context effect and the resulting semantic…

Computer Vision and Pattern Recognition · Computer Science 2019-11-26 Yongfei Liu , Bo Wan , Xiaodan Zhu , Xuming He

Explainable Deep Classification Models for Domain Generalization

Conventionally, AI models are thought to trade off explainability for lower accuracy. We develop a training strategy that not only leads to a more explainable AI system for object classification, but as a consequence, suffers no perceptible…

Computer Vision and Pattern Recognition · Computer Science 2020-03-17 Andrea Zunino , Sarah Adel Bargal , Riccardo Volpi , Mehrnoosh Sameki , Jianming Zhang , Stan Sclaroff , Vittorio Murino , Kate Saenko

Leveraging Conditional Generative Models in a General Explanation Framework of Classifier Decisions

Providing a human-understandable explanation of classifiers' decisions has become imperative to generate trust in their use for day-to-day tasks. Although many works have addressed this problem by generating visual explanation maps, they…

Machine Learning · Computer Science 2021-06-22 Martin Charachon , Paul-Henry Cournède , Céline Hudelot , Roberto Ardon

Learning to Seek Evidence: A Verifiable Reasoning Agent with Causal Faithfulness Analysis

Explanations for AI models in high-stakes domains like medicine often lack verifiability, which can hinder trust. To address this, we propose an interactive agent that produces explanations through an auditable sequence of actions. The…

Artificial Intelligence · Computer Science 2025-11-04 Yuhang Huang , Zekai Lin , Fan Zhong , Lei Liu

Do Users Benefit From Interpretable Vision? A User Study, Baseline, And Dataset

A variety of methods exist to explain image classification models. However, whether they provide any benefit to users over simply comparing various inputs and the model's respective predictions remains unclear. We conducted a user study…

Machine Learning · Computer Science 2022-04-26 Leon Sixt , Martin Schuessler , Oana-Iuliana Popescu , Philipp Weiß , Tim Landgraf

Learning to Generate Grounded Visual Captions without Localization Supervision

When automatically generating a sentence description for an image or video, it often remains unclear how well the generated caption is grounded, that is whether the model uses the correct image regions to output particular words, or if the…

Computer Vision and Pattern Recognition · Computer Science 2020-07-21 Chih-Yao Ma , Yannis Kalantidis , Ghassan AlRegib , Peter Vajda , Marcus Rohrbach , Zsolt Kira

Improved Visual Grounding through Self-Consistent Explanations

Vision-and-language models trained to match images with text can be combined with visual explanation methods to point to the locations of specific objects in an image. Our work shows that the localization --"grounding"-- abilities of these…

Computer Vision and Pattern Recognition · Computer Science 2023-12-08 Ruozhen He , Paola Cascante-Bonilla , Ziyan Yang , Alexander C. Berg , Vicente Ordonez

Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning

Visual language grounding is widely studied in modern neural image captioning systems, which typically adopts an encoder-decoder framework consisting of two principal components: a convolutional neural network (CNN) for image feature…

Computer Vision and Pattern Recognition · Computer Science 2018-05-23 Hongge Chen , Huan Zhang , Pin-Yu Chen , Jinfeng Yi , Cho-Jui Hsieh

A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models

Key to tasks that require reasoning about natural language in visual contexts is grounding words and phrases to image regions. However, observing this grounding in contemporary models is complex, even if it is generally expected to take…

Computation and Language · Computer Science 2024-06-03 Noriyuki Kojima , Hadar Averbuch-Elor , Yoav Artzi

Seeing the advantage: visually grounding word embeddings to better capture human semantic knowledge

Distributional semantic models capture word-level meaning that is useful in many natural language processing tasks and have even been shown to capture cognitive aspects of word meaning. The majority of these models are purely text based,…

Computation and Language · Computer Science 2022-03-31 Danny Merkx , Stefan L. Frank , Mirjam Ernestus

Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models

Large vision-and-language models (VLMs) trained to match images with text on large-scale datasets of image-text pairs have shown impressive generalization ability on several vision and language tasks. Several recent works, however, showed…

Computer Vision and Pattern Recognition · Computer Science 2024-03-07 Navid Rajabi , Jana Kosecka

Improving Image Captioning with Better Use of Captions

Image captioning is a multimodal problem that has drawn extensive attention in both the natural language processing and computer vision community. In this paper, we present a novel image captioning architecture to better explore semantics…

Computer Vision and Pattern Recognition · Computer Science 2020-06-23 Zhan Shi , Xu Zhou , Xipeng Qiu , Xiaodan Zhu