English
Related papers

Related papers: Grounding Visual Explanations

200 papers

Existing models which generate textual explanations enforce task relevance through a discriminative term loss function, but such mechanisms only weakly constrain mentioned object parts to actually be present in the image. In this paper, a…

Computer Vision and Pattern Recognition · Computer Science 2017-11-20 Lisa Anne Hendricks , Ronghang Hu , Trevor Darrell , Zeynep Akata

When answering questions about an image, it not only needs knowing what -- understanding the fine-grained contents (e.g., objects, relationships) in the image, but also telling why -- reasoning over grounding visual cues to derive the…

Computer Vision and Pattern Recognition · Computer Science 2020-12-22 Jianwei Yang , Jiayuan Mao , Jiajun Wu , Devi Parikh , David D. Cox , Joshua B. Tenenbaum , Chuang Gan

Conventional phrase grounding aims to localize noun phrases mentioned in a given caption to their corresponding image regions, which has achieved great success recently. Apparently, sole noun phrase grounding is not enough for cross-modal…

Computation and Language · Computer Science 2022-10-25 Panzhong Lu , Xin Zhang , Meishan Zhang , Min Zhang

Clearly explaining a rationale for a classification decision to an end-user can be as important as the decision itself. Existing approaches for deep visual recognition are generally opaque and do not output any justification text;…

Computer Vision and Pattern Recognition · Computer Science 2016-03-29 Lisa Anne Hendricks , Zeynep Akata , Marcus Rohrbach , Jeff Donahue , Bernt Schiele , Trevor Darrell

Effectiveness and interpretability are two essential properties for trustworthy AI systems. Most recent studies in visual reasoning are dedicated to improving the accuracy of predicted answers, and less attention is paid to explaining the…

Computer Vision and Pattern Recognition · Computer Science 2022-03-14 Shi Chen , Qi Zhao

Evaluating explanations of image classifiers regarding ground truth, e.g. segmentation masks defined by human perception, primarily evaluates the quality of the models under consideration rather than the explanation methods themselves.…

Computer Vision and Pattern Recognition · Computer Science 2023-11-09 Hubert Baniecki , Maciej Chrabaszcz , Andreas Holzinger , Bastian Pfeifer , Anna Saranti , Przemyslaw Biecek

We present a visually-grounded language understanding model based on a study of how people verbally describe objects in scenes. The emphasis of the model is on the combination of individual word meanings to produce meanings for complex…

Artificial Intelligence · Computer Science 2011-07-04 P. Gorniak , D. Roy

Explaining artificial intelligence (AI) predictions is increasingly important and even imperative in many high-stakes applications where humans are the ultimate decision-makers. In this work, we propose two novel architectures of…

Computer Vision and Pattern Recognition · Computer Science 2023-09-01 Giang Nguyen , Mohammad Reza Taesiri , Anh Nguyen

Visual grounding is a ubiquitous building block in many vision-language tasks and yet remains challenging due to large variations in visual and linguistic features of grounding entities, strong context effect and the resulting semantic…

Computer Vision and Pattern Recognition · Computer Science 2019-11-26 Yongfei Liu , Bo Wan , Xiaodan Zhu , Xuming He

Conventionally, AI models are thought to trade off explainability for lower accuracy. We develop a training strategy that not only leads to a more explainable AI system for object classification, but as a consequence, suffers no perceptible…

Computer Vision and Pattern Recognition · Computer Science 2020-03-17 Andrea Zunino , Sarah Adel Bargal , Riccardo Volpi , Mehrnoosh Sameki , Jianming Zhang , Stan Sclaroff , Vittorio Murino , Kate Saenko

Providing a human-understandable explanation of classifiers' decisions has become imperative to generate trust in their use for day-to-day tasks. Although many works have addressed this problem by generating visual explanation maps, they…

Machine Learning · Computer Science 2021-06-22 Martin Charachon , Paul-Henry Cournède , Céline Hudelot , Roberto Ardon

Explanations for AI models in high-stakes domains like medicine often lack verifiability, which can hinder trust. To address this, we propose an interactive agent that produces explanations through an auditable sequence of actions. The…

Artificial Intelligence · Computer Science 2025-11-04 Yuhang Huang , Zekai Lin , Fan Zhong , Lei Liu

A variety of methods exist to explain image classification models. However, whether they provide any benefit to users over simply comparing various inputs and the model's respective predictions remains unclear. We conducted a user study…

Machine Learning · Computer Science 2022-04-26 Leon Sixt , Martin Schuessler , Oana-Iuliana Popescu , Philipp Weiß , Tim Landgraf

When automatically generating a sentence description for an image or video, it often remains unclear how well the generated caption is grounded, that is whether the model uses the correct image regions to output particular words, or if the…

Computer Vision and Pattern Recognition · Computer Science 2020-07-21 Chih-Yao Ma , Yannis Kalantidis , Ghassan AlRegib , Peter Vajda , Marcus Rohrbach , Zsolt Kira

Vision-and-language models trained to match images with text can be combined with visual explanation methods to point to the locations of specific objects in an image. Our work shows that the localization --"grounding"-- abilities of these…

Computer Vision and Pattern Recognition · Computer Science 2023-12-08 Ruozhen He , Paola Cascante-Bonilla , Ziyan Yang , Alexander C. Berg , Vicente Ordonez

Visual language grounding is widely studied in modern neural image captioning systems, which typically adopts an encoder-decoder framework consisting of two principal components: a convolutional neural network (CNN) for image feature…

Computer Vision and Pattern Recognition · Computer Science 2018-05-23 Hongge Chen , Huan Zhang , Pin-Yu Chen , Jinfeng Yi , Cho-Jui Hsieh

Key to tasks that require reasoning about natural language in visual contexts is grounding words and phrases to image regions. However, observing this grounding in contemporary models is complex, even if it is generally expected to take…

Computation and Language · Computer Science 2024-06-03 Noriyuki Kojima , Hadar Averbuch-Elor , Yoav Artzi

Distributional semantic models capture word-level meaning that is useful in many natural language processing tasks and have even been shown to capture cognitive aspects of word meaning. The majority of these models are purely text based,…

Computation and Language · Computer Science 2022-03-31 Danny Merkx , Stefan L. Frank , Mirjam Ernestus

Large vision-and-language models (VLMs) trained to match images with text on large-scale datasets of image-text pairs have shown impressive generalization ability on several vision and language tasks. Several recent works, however, showed…

Computer Vision and Pattern Recognition · Computer Science 2024-03-07 Navid Rajabi , Jana Kosecka

Image captioning is a multimodal problem that has drawn extensive attention in both the natural language processing and computer vision community. In this paper, we present a novel image captioning architecture to better explore semantics…

Computer Vision and Pattern Recognition · Computer Science 2020-06-23 Zhan Shi , Xu Zhou , Xipeng Qiu , Xiaodan Zhu
‹ Prev 1 2 3 10 Next ›