Related papers: Weakly-Supervised Visual-Retriever-Reader for Know…

Knowledge Detection by Relevant Question and Image Attributes in Visual Question Answering

Visual question answering (VQA) is a Multidisciplinary research problem that pursued through practices of natural language processing and computer vision. Visual question answering automatically answers natural language questions according…

Computer Vision and Pattern Recognition · Computer Science 2024-09-01 Param Ahir , Hiteishi Diwanji

A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA

Knowledge-based Visual Question Answering (VQA) expects models to rely on external knowledge for robust answer prediction. Though significant it is, this paper discovers several leading factors impeding the advancement of current…

Computer Vision and Pattern Recognition · Computer Science 2022-07-01 Yangyang Guo , Liqiang Nie , Yongkang Wong , Yibing Liu , Zhiyong Cheng , Mohan Kankanhalli

OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

Visual Question Answering (VQA) in its ideal form lets us study reasoning in the joint space of vision and language and serves as a proxy for the AI task of scene understanding. However, most VQA benchmarks to date are focused on questions…

Computer Vision and Pattern Recognition · Computer Science 2019-09-05 Kenneth Marino , Mohammad Rastegari , Ali Farhadi , Roozbeh Mottaghi

Knowledge Condensation and Reasoning for Knowledge-based VQA

Knowledge-based visual question answering (KB-VQA) is a challenging task, which requires the model to leverage external knowledge for comprehending and answering questions grounded in visual content. Recent studies retrieve the knowledge…

Computer Vision and Pattern Recognition · Computer Science 2024-03-18 Dongze Hao , Jian Jia , Longteng Guo , Qunbo Wang , Te Yang , Yan Li , Yanhua Cheng , Bo Wang , Quan Chen , Han Li , Jing Liu

Retrieval Augmented Visual Question Answering with Outside Knowledge

Outside-Knowledge Visual Question Answering (OK-VQA) is a challenging VQA task that requires retrieval of external knowledge to answer questions about images. Recent OK-VQA systems use Dense Passage Retrieval (DPR) to retrieve documents…

Computation and Language · Computer Science 2022-11-01 Weizhe Lin , Bill Byrne

Learning to Search: A Decision-Based Agent for Knowledge-Based Visual Question Answering

Knowledge-based visual question answering (KB-VQA) requires vision-language models to understand images and use external knowledge, especially for rare entities and long-tail facts. Most existing retrieval-augmented generation (RAG) methods…

Computer Vision and Pattern Recognition · Computer Science 2026-04-10 Zhuohong Chen , Zhenxian Wu , Yunyao Yu , Hangrui Xu , Zirui Liao , Zhifang Liu , Xiangwen Deng , Pen Jiao , Haoqian Wang

Visual Question Answering as Reading Comprehension

Visual question answering (VQA) demands simultaneous comprehension of both the image visual content and natural language questions. In some cases, the reasoning needs the help of common sense or general knowledge which usually appear in the…

Computer Vision and Pattern Recognition · Computer Science 2018-11-30 Hui Li , Peng Wang , Chunhua Shen , Anton van den Hengel

Open-Set Knowledge-Based Visual Question Answering with Inference Paths

Given an image and an associated textual question, the purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases. Prior KB-VQA models are usually…

Machine Learning · Computer Science 2023-10-13 Jingru Gan , Xinzhe Han , Shuhui Wang , Qingming Huang

Uncertainty-based Visual Question Answering: Estimating Semantic Inconsistency between Image and Knowledge Base

Knowledge-based visual question answering (KVQA) task aims to answer questions that require additional external knowledge as well as an understanding of images and questions. Recent studies on KVQA inject an external knowledge in a…

Computer Vision and Pattern Recognition · Computer Science 2022-07-28 Jinyeong Chae , Jihie Kim

REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering

This paper revisits visual representation in knowledge-based visual question answering (VQA) and demonstrates that using regional information in a better way can significantly improve the performance. While visual representation is…

Computer Vision and Pattern Recognition · Computer Science 2022-10-11 Yuanze Lin , Yujia Xie , Dongdong Chen , Yichong Xu , Chenguang Zhu , Lu Yuan

Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering

Knowledge-based visual question answering (KB-VQA) requires visual language models (VLMs) to integrate visual understanding with external knowledge retrieval. Although retrieval-augmented generation (RAG) achieves significant advances in…

Computer Vision and Pattern Recognition · Computer Science 2025-10-21 Yuyang Hong , Jiaqi Gu , Qi Yang , Lubin Fan , Yue Wu , Ying Wang , Kun Ding , Shiming Xiang , Jieping Ye

Visual Question Answering using Deep Learning: A Survey and Performance Analysis

The Visual Question Answering (VQA) task combines challenges for processing data with both Visual and Linguistic processing, to answer basic `common sense' questions about given images. Given an image and a question in natural language, the…

Computer Vision and Pattern Recognition · Computer Science 2020-12-24 Yash Srivastava , Vaishnav Murali , Shiv Ram Dubey , Snehasis Mukherjee

A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge

The Visual Question Answering (VQA) task aspires to provide a meaningful testbed for the development of AI models that can jointly reason over visual and natural language inputs. Despite a proliferation of VQA datasets, this goal is…

Computer Vision and Pattern Recognition · Computer Science 2022-06-06 Dustin Schwenk , Apoorv Khandelwal , Christopher Clark , Kenneth Marino , Roozbeh Mottaghi

VQA: Visual Question Answering

We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios,…

Computation and Language · Computer Science 2016-10-28 Aishwarya Agrawal , Jiasen Lu , Stanislaw Antol , Margaret Mitchell , C. Lawrence Zitnick , Dhruv Batra , Devi Parikh

Visual Question Answering: A Survey of Methods and Datasets

Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. Given an image and a question in natural language, it requires…

Computer Vision and Pattern Recognition · Computer Science 2016-07-21 Qi Wu , Damien Teney , Peng Wang , Chunhua Shen , Anthony Dick , Anton van den Hengel

QKVQA: Question-Focused Filtering for Knowledge-based VQA

Visual Question Answering (VQA) is the task of answering questions based on image content. Building upon this, Knowledge-Based VQA (KB-VQA) requires models to answer questions that depend on external knowledge beyond the visual content of…

Information Retrieval · Computer Science 2026-04-08 Wei Ye , Yixin Su , Yueguo Chen , Longxiang Gao , Jianjun Li , Ruixuan Li , Rui Zhang

Generating Question Relevant Captions to Aid Visual Question Answering

Visual question answering (VQA) and image captioning require a shared body of general knowledge connecting language and vision. We present a novel approach to improve VQA performance that exploits this connection by jointly generating…

Computer Vision and Pattern Recognition · Computer Science 2020-01-07 Jialin Wu , Zeyuan Hu , Raymond J. Mooney

A survey on VQA_Datasets and Approaches

Visual question answering (VQA) is a task that combines both the techniques of computer vision and natural language processing. It requires models to answer a text-based question according to the information contained in a visual. In recent…

Computer Vision and Pattern Recognition · Computer Science 2021-05-04 Yeyun Zou , Qiyu Xie

Analysis of Visual Question Answering Algorithms with attention model

Visual question answering (VQA) usesimage processing algorithms to process the image and natural language processing methods to understand and answer the question. VQA is helpful to a visually impaired person, can be used for the security…

Computer Vision and Pattern Recognition · Computer Science 2023-05-31 Param Ahir , Hiteishi M. Diwanji

WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata

Visual Question Answering (VQA) benchmarks have largely emphasized perception-based tasks that can be solved from visual content alone. In contrast, many real-world scenarios require external knowledge that is not directly observable in the…

Computer Vision and Pattern Recognition · Computer Science 2026-05-21 Basel Shbita , Pengyuan Li , Anna Lisa Gentile