Related papers: Logically Consistent Loss for Visual Question Answ…

Logical Implications for Visual Question Answering Consistency

Despite considerable recent progress in Visual Question Answering (VQA) models, inconsistent or contradictory answers continue to cast doubt on their true reasoning capabilities. However, most proposed methods use indirect strategies or…

Computer Vision and Pattern Recognition · Computer Science 2023-03-17 Sergio Tascon-Morales , Pablo Márquez-Neila , Raphael Sznitman

Consistency-preserving Visual Question Answering in Medical Imaging

Visual Question Answering (VQA) models take an image and a natural-language question as input and infer the answer to the question. Recently, VQA systems in medical imaging have gained popularity thanks to potential advantages such as…

Computer Vision and Pattern Recognition · Computer Science 2022-06-28 Sergio Tascon-Morales , Pablo Márquez-Neila , Raphael Sznitman

VQA-LOL: Visual Question Answering under the Lens of Logic

Logical connectives and their implications on the meaning of a natural language sentence are a fundamental aspect of understanding. In this paper, we investigate whether visual question answering (VQA) systems trained to answer a question…

Computer Vision and Pattern Recognition · Computer Science 2020-07-17 Tejas Gokhale , Pratyay Banerjee , Chitta Baral , Yezhou Yang

A Simple Loss Function for Improving the Convergence and Accuracy of Visual Question Answering Models

Visual question answering as recently proposed multimodal learning task has enjoyed wide attention from the deep learning community. Lately, the focus was on developing new representation fusion methods and attention mechanisms to achieve…

Computer Vision and Pattern Recognition · Computer Science 2017-08-03 Ilija Ilievski , Jiashi Feng

MAGIC-VQA: Multimodal And Grounded Inference with Commonsense Knowledge for Visual Question Answering

Visual Question Answering (VQA) requires reasoning across visual and textual modalities, yet Large Vision-Language Models (LVLMs) often lack integrated commonsense knowledge, limiting their robustness in real-world scenarios. To address…

Computation and Language · Computer Science 2025-06-12 Shuo Yang , Siwen Luo , Soyeon Caren Han , Eduard Hovy

Cycle-Consistency for Robust Visual Question Answering

Despite significant progress in Visual Question Answering over the years, robustness of today's VQA models leave much to be desired. We introduce a new evaluation protocol and associated dataset (VQA-Rephrasings) and show that…

Computer Vision and Pattern Recognition · Computer Science 2019-02-18 Meet Shah , Xinlei Chen , Marcus Rohrbach , Devi Parikh

Visual Question Answering as a Meta Learning Task

The predominant approach to Visual Question Answering (VQA) demands that the model represents within its weights all of the information required to answer any question about any image. Learning this information from any real training set…

Computer Vision and Pattern Recognition · Computer Science 2017-11-23 Damien Teney , Anton van den Hengel

MacVQA: Adaptive Memory Allocation and Global Noise Filtering for Continual Visual Question Answering

Visual Question Answering (VQA) requires models to reason over multimodal information, combining visual and textual data. With the development of continual learning, significant progress has been made in retaining knowledge and adapting to…

Computer Vision and Pattern Recognition · Computer Science 2026-01-06 Zhifei Li , Yiran Wang , Chenyi Xiong , Yujing Xia , Xiaoju Hou , Yue Zhao , Miao Zhang , Kui Xiao , Bing Yang

AIM: Asymmetric Information Masking for Visual Question Answering Continual Learning

In continual visual question answering (VQA), existing Continual Learning (CL) methods are mostly built for symmetric, unimodal architectures. However, modern Vision-Language Models (VLMs) violate this assumption, as their trainable…

Computer Vision and Pattern Recognition · Computer Science 2026-04-17 Peifeng Zhang , Zice Qiu , Donghua Yu , Shilei Cao , Juepeng Zheng , Yutong Lu , Haohuan Fu

Learning from Lexical Perturbations for Consistent Visual Question Answering

Existing Visual Question Answering (VQA) models are often fragile and sensitive to input variations. In this paper, we propose a novel approach to address this issue based on modular networks, which creates two questions related by…

Computer Vision and Pattern Recognition · Computer Science 2020-12-24 Spencer Whitehead , Hui Wu , Yi Ren Fung , Heng Ji , Rogerio Feris , Kate Saenko

Logic-Guided Data Augmentation and Regularization for Consistent Question Answering

Many natural language questions require qualitative, quantitative or logical comparisons between two entities or events. This paper addresses the problem of improving the accuracy and consistency of responses to comparison questions by…

Computation and Language · Computer Science 2020-05-26 Akari Asai , Hannaneh Hajishirzi

Visuo-Linguistic Question Answering (VLQA) Challenge

Understanding images and text together is an important aspect of cognition and building advanced Artificial Intelligence (AI) systems. As a community, we have achieved good benchmarks over language and vision domains separately, however…

Computer Vision and Pattern Recognition · Computer Science 2020-11-19 Shailaja Keyur Sampat , Yezhou Yang , Chitta Baral

LiCQA : A Lightweight Complex Question Answering System

Over the last twenty years, significant progress has been made in designing and implementing Question Answering (QA) systems. However, addressing complex questions, the answers to which are spread across multiple documents, remains a…

Computation and Language · Computer Science 2026-02-26 Sourav Saha , Dwaipayan Roy , Mandar Mitra

Robust Visual Question Answering: Datasets, Methods, and Future Challenges

Visual question answering requires a system to provide an accurate natural language answer given an image and a natural language question. However, it is widely recognized that previous generic VQA methods often exhibit a tendency to…

Computer Vision and Pattern Recognition · Computer Science 2024-02-20 Jie Ma , Pinghui Wang , Dechen Kong , Zewei Wang , Jun Liu , Hongbin Pei , Junzhou Zhao

Deep Bayesian Active Learning for Multiple Correct Outputs

Typical active learning strategies are designed for tasks, such as classification, with the assumption that the output space is mutually exclusive. The assumption that these tasks always have exactly one correct answer has resulted in the…

Computer Vision and Pattern Recognition · Computer Science 2019-12-10 Khaled Jedoui , Ranjay Krishna , Michael Bernstein , Li Fei-Fei

Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion

Visual Question Answering (VQA) is a challenging task that requires systems to provide accurate answers to questions based on image content. Current VQA models struggle with complex questions due to limitations in capturing and integrating…

Computer Vision and Pattern Recognition · Computer Science 2024-09-24 Peiyuan Chen , Zecheng Zhang , Yiping Dong , Li Zhou , Han Wang

CC-VQA: Conflict- and Correlation-Aware Method for Mitigating Knowledge Conflict in Knowledge-Based Visual Question Answering

Knowledge-based visual question answering (KB-VQA) demonstrates significant potential for handling knowledge-intensive tasks. However, conflicts arise between static parametric knowledge in vision language models (VLMs) and dynamically…

Computer Vision and Pattern Recognition · Computer Science 2026-03-02 Yuyang Hong , Jiaqi Gu , Yujin Lou , Lubin Fan , Qi Yang , Ying Wang , Kun Ding , Yue Wu , Shiming Xiang , Jieping Ye

Cross-Modal Contrastive Learning for Robust Reasoning in VQA

Multi-modal reasoning in visual question answering (VQA) has witnessed rapid progress recently. However, most reasoning models heavily rely on shortcuts learned from training data, which prevents their usage in challenging real-world…

Computer Vision and Pattern Recognition · Computer Science 2022-11-22 Qi Zheng , Chaoyue Wang , Daqing Liu , Dadong Wang , Dacheng Tao

Knowing or Guessing? Robust Medical Visual Question Answering via Joint Consistency and Contrastive Learning

In high-stakes medical applications, consistent answering across diverse question phrasings is essential for reliable diagnosis. However, we reveal that current Medical Vision-Language Models (Med-VLMs) exhibit concerning fragility in…

Computation and Language · Computer Science 2025-08-27 Songtao Jiang , Yuxi Chen , Sibo Song , Yan Zhang , Yeying Jin , Yang Feng , Jian Wu , Zuozhu Liu

IQ-VQA: Intelligent Visual Question Answering

Even though there has been tremendous progress in the field of Visual Question Answering, models today still tend to be inconsistent and brittle. To this end, we propose a model-independent cyclic framework which increases consistency and…

Computer Vision and Pattern Recognition · Computer Science 2020-07-10 Vatsal Goel , Mohit Chandak , Ashish Anand , Prithwijit Guha