English
Related papers

Related papers: Logically Consistent Loss for Visual Question Answ…

200 papers

Despite considerable recent progress in Visual Question Answering (VQA) models, inconsistent or contradictory answers continue to cast doubt on their true reasoning capabilities. However, most proposed methods use indirect strategies or…

Computer Vision and Pattern Recognition · Computer Science 2023-03-17 Sergio Tascon-Morales , Pablo Márquez-Neila , Raphael Sznitman

Visual Question Answering (VQA) models take an image and a natural-language question as input and infer the answer to the question. Recently, VQA systems in medical imaging have gained popularity thanks to potential advantages such as…

Computer Vision and Pattern Recognition · Computer Science 2022-06-28 Sergio Tascon-Morales , Pablo Márquez-Neila , Raphael Sznitman

Logical connectives and their implications on the meaning of a natural language sentence are a fundamental aspect of understanding. In this paper, we investigate whether visual question answering (VQA) systems trained to answer a question…

Computer Vision and Pattern Recognition · Computer Science 2020-07-17 Tejas Gokhale , Pratyay Banerjee , Chitta Baral , Yezhou Yang

Visual question answering as recently proposed multimodal learning task has enjoyed wide attention from the deep learning community. Lately, the focus was on developing new representation fusion methods and attention mechanisms to achieve…

Computer Vision and Pattern Recognition · Computer Science 2017-08-03 Ilija Ilievski , Jiashi Feng

Visual Question Answering (VQA) requires reasoning across visual and textual modalities, yet Large Vision-Language Models (LVLMs) often lack integrated commonsense knowledge, limiting their robustness in real-world scenarios. To address…

Computation and Language · Computer Science 2025-06-12 Shuo Yang , Siwen Luo , Soyeon Caren Han , Eduard Hovy

Despite significant progress in Visual Question Answering over the years, robustness of today's VQA models leave much to be desired. We introduce a new evaluation protocol and associated dataset (VQA-Rephrasings) and show that…

Computer Vision and Pattern Recognition · Computer Science 2019-02-18 Meet Shah , Xinlei Chen , Marcus Rohrbach , Devi Parikh

The predominant approach to Visual Question Answering (VQA) demands that the model represents within its weights all of the information required to answer any question about any image. Learning this information from any real training set…

Computer Vision and Pattern Recognition · Computer Science 2017-11-23 Damien Teney , Anton van den Hengel

Visual Question Answering (VQA) requires models to reason over multimodal information, combining visual and textual data. With the development of continual learning, significant progress has been made in retaining knowledge and adapting to…

Computer Vision and Pattern Recognition · Computer Science 2026-01-06 Zhifei Li , Yiran Wang , Chenyi Xiong , Yujing Xia , Xiaoju Hou , Yue Zhao , Miao Zhang , Kui Xiao , Bing Yang

In continual visual question answering (VQA), existing Continual Learning (CL) methods are mostly built for symmetric, unimodal architectures. However, modern Vision-Language Models (VLMs) violate this assumption, as their trainable…

Computer Vision and Pattern Recognition · Computer Science 2026-04-17 Peifeng Zhang , Zice Qiu , Donghua Yu , Shilei Cao , Juepeng Zheng , Yutong Lu , Haohuan Fu

Existing Visual Question Answering (VQA) models are often fragile and sensitive to input variations. In this paper, we propose a novel approach to address this issue based on modular networks, which creates two questions related by…

Computer Vision and Pattern Recognition · Computer Science 2020-12-24 Spencer Whitehead , Hui Wu , Yi Ren Fung , Heng Ji , Rogerio Feris , Kate Saenko

Many natural language questions require qualitative, quantitative or logical comparisons between two entities or events. This paper addresses the problem of improving the accuracy and consistency of responses to comparison questions by…

Computation and Language · Computer Science 2020-05-26 Akari Asai , Hannaneh Hajishirzi

Understanding images and text together is an important aspect of cognition and building advanced Artificial Intelligence (AI) systems. As a community, we have achieved good benchmarks over language and vision domains separately, however…

Computer Vision and Pattern Recognition · Computer Science 2020-11-19 Shailaja Keyur Sampat , Yezhou Yang , Chitta Baral

Over the last twenty years, significant progress has been made in designing and implementing Question Answering (QA) systems. However, addressing complex questions, the answers to which are spread across multiple documents, remains a…

Computation and Language · Computer Science 2026-02-26 Sourav Saha , Dwaipayan Roy , Mandar Mitra

Visual question answering requires a system to provide an accurate natural language answer given an image and a natural language question. However, it is widely recognized that previous generic VQA methods often exhibit a tendency to…

Computer Vision and Pattern Recognition · Computer Science 2024-02-20 Jie Ma , Pinghui Wang , Dechen Kong , Zewei Wang , Jun Liu , Hongbin Pei , Junzhou Zhao

Typical active learning strategies are designed for tasks, such as classification, with the assumption that the output space is mutually exclusive. The assumption that these tasks always have exactly one correct answer has resulted in the…

Computer Vision and Pattern Recognition · Computer Science 2019-12-10 Khaled Jedoui , Ranjay Krishna , Michael Bernstein , Li Fei-Fei

Visual Question Answering (VQA) is a challenging task that requires systems to provide accurate answers to questions based on image content. Current VQA models struggle with complex questions due to limitations in capturing and integrating…

Computer Vision and Pattern Recognition · Computer Science 2024-09-24 Peiyuan Chen , Zecheng Zhang , Yiping Dong , Li Zhou , Han Wang

Knowledge-based visual question answering (KB-VQA) demonstrates significant potential for handling knowledge-intensive tasks. However, conflicts arise between static parametric knowledge in vision language models (VLMs) and dynamically…

Computer Vision and Pattern Recognition · Computer Science 2026-03-02 Yuyang Hong , Jiaqi Gu , Yujin Lou , Lubin Fan , Qi Yang , Ying Wang , Kun Ding , Yue Wu , Shiming Xiang , Jieping Ye

Multi-modal reasoning in visual question answering (VQA) has witnessed rapid progress recently. However, most reasoning models heavily rely on shortcuts learned from training data, which prevents their usage in challenging real-world…

Computer Vision and Pattern Recognition · Computer Science 2022-11-22 Qi Zheng , Chaoyue Wang , Daqing Liu , Dadong Wang , Dacheng Tao

In high-stakes medical applications, consistent answering across diverse question phrasings is essential for reliable diagnosis. However, we reveal that current Medical Vision-Language Models (Med-VLMs) exhibit concerning fragility in…

Computation and Language · Computer Science 2025-08-27 Songtao Jiang , Yuxi Chen , Sibo Song , Yan Zhang , Yeying Jin , Yang Feng , Jian Wu , Zuozhu Liu

Even though there has been tremendous progress in the field of Visual Question Answering, models today still tend to be inconsistent and brittle. To this end, we propose a model-independent cyclic framework which increases consistency and…

Computer Vision and Pattern Recognition · Computer Science 2020-07-10 Vatsal Goel , Mohit Chandak , Ashish Anand , Prithwijit Guha
‹ Prev 1 2 3 10 Next ›