Related papers: VisKoP: Visual Knowledge oriented Programming for …

A Comprehensive Survey of Knowledge-Based Vision Question Answering Systems: The Lifecycle of Knowledge in Visual Reasoning Task

Knowledge-based Vision Question Answering (KB-VQA) extends general Vision Question Answering (VQA) by not only requiring the understanding of visual and textual inputs but also extensive range of knowledge, enabling significant advancements…

Computer Vision and Pattern Recognition · Computer Science 2025-04-25 Jiaqi Deng , Zonghan Wu , Huan Huo , Guandong Xu

A Simple Baseline for Knowledge-Based Visual Question Answering

This paper is on the problem of Knowledge-Based Visual Question Answering (KB-VQA). Recent works have emphasized the significance of incorporating both explicit (through external databases) and implicit (through LLMs) knowledge to answer…

Computer Vision and Pattern Recognition · Computer Science 2023-10-25 Alexandros Xenos , Themos Stafylakis , Ioannis Patras , Georgios Tzimiropoulos

Interactive-KBQA: Multi-Turn Interactions for Knowledge Base Question Answering with Large Language Models

This study explores the realm of knowledge base question answering (KBQA). KBQA is considered a challenging task, particularly in parsing intricate questions into executable logical forms. Traditional semantic parsing (SP)-based methods…

Computation and Language · Computer Science 2025-03-13 Guanming Xiong , Junwei Bao , Wen Zhao

Learning to Search: A Decision-Based Agent for Knowledge-Based Visual Question Answering

Knowledge-based visual question answering (KB-VQA) requires vision-language models to understand images and use external knowledge, especially for rare entities and long-tail facts. Most existing retrieval-augmented generation (RAG) methods…

Computer Vision and Pattern Recognition · Computer Science 2026-04-10 Zhuohong Chen , Zhenxian Wu , Yunyao Yu , Hangrui Xu , Zirui Liao , Zhifang Liu , Xiangwen Deng , Pen Jiao , Haoqian Wang

KQA Pro: A Dataset with Explicit Compositional Programs for Complex Question Answering over Knowledge Base

Complex question answering over knowledge base (Complex KBQA) is challenging because it requires various compositional reasoning capabilities, such as multi-hop inference, attribute comparison, set operation. Existing benchmarks have some…

Computation and Language · Computer Science 2022-06-24 Shulin Cao , Jiaxin Shi , Liangming Pan , Lunyiu Nie , Yutong Xiang , Lei Hou , Juanzi Li , Bin He , Hanwang Zhang

Find The Gap: Knowledge Base Reasoning For Visual Question Answering

We analyze knowledge-based visual question answering, for which given a question, the models need to ground it into the visual modality and retrieve the relevant knowledge from a given large knowledge base (KB) to be able to answer. Our…

Artificial Intelligence · Computer Science 2024-04-17 Elham J. Barezi , Parisa Kordjamshidi

Precision Empowers, Excess Distracts: Visual Question Answering With Dynamically Infused Knowledge In Language Models

In the realm of multimodal tasks, Visual Question Answering (VQA) plays a crucial role by addressing natural language questions grounded in visual content. Knowledge-Based Visual Question Answering (KBVQA) advances this concept by adding…

Computation and Language · Computer Science 2024-06-17 Manas Jhalani , Annervaz K M , Pushpak Bhattacharyya

From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities

The multimodal task of Visual Question Answering (VQA) encompassing elements of Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers to questions on any visual input. Over time, the scope of VQA has expanded…

Computer Vision and Pattern Recognition · Computer Science 2024-11-05 Md Farhan Ishmam , Md Sakib Hossain Shovon , M. F. Mridha , Nilanjan Dey

Recursive Visual Programming

Visual Programming (VP) has emerged as a powerful framework for Visual Question Answering (VQA). By generating and executing bespoke code for each question, these methods demonstrate impressive compositional and reasoning capabilities,…

Computer Vision and Pattern Recognition · Computer Science 2024-07-11 Jiaxin Ge , Sanjay Subramanian , Baifeng Shi , Roei Herzig , Trevor Darrell

Disentangling Knowledge-based and Visual Reasoning by Question Decomposition in KB-VQA

We study the Knowledge-Based visual question-answering problem, for which given a question, the models need to ground it into the visual modality to find the answer. Although many recent works use question-dependent captioners to verbalize…

Artificial Intelligence · Computer Science 2024-06-28 Elham J. Barezi , Parisa Kordjamshidi

LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection

Visual question answering (VQA) often requires an understanding of visual concepts and language semantics, which relies on external knowledge. Most existing methods exploit pre-trained language models or/and unstructured text, but the…

Computer Vision and Pattern Recognition · Computer Science 2022-11-29 Zhuo Chen , Yufeng Huang , Jiaoyan Chen , Yuxia Geng , Yin Fang , Jeff Pan , Ningyu Zhang , Wen Zhang

Understanding Knowledge Gaps in Visual Question Answering: Implications for Gap Identification and Testing

Visual Question Answering (VQA) systems are tasked with answering natural language questions corresponding to a presented image. Traditional VQA datasets typically contain questions related to the spatial information of objects, object…

Computation and Language · Computer Science 2020-06-05 Goonmeet Bajaj , Bortik Bandyopadhyay , Daniel Schmidt , Pranav Maneriker , Christopher Myers , Srinivasan Parthasarathy

Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering

Knowledge-based visual question answering (KB-VQA) requires visual language models (VLMs) to integrate visual understanding with external knowledge retrieval. Although retrieval-augmented generation (RAG) achieves significant advances in…

Computer Vision and Pattern Recognition · Computer Science 2025-10-21 Yuyang Hong , Jiaqi Gu , Qi Yang , Lubin Fan , Yue Wu , Ying Wang , Kun Ding , Shiming Xiang , Jieping Ye

Exploring Question Decomposition for Zero-Shot VQA

Visual question answering (VQA) has traditionally been treated as a single-step task where each question receives the same amount of effort, unlike natural human question-answering strategies. We explore a question decomposition strategy…

Computer Vision and Pattern Recognition · Computer Science 2023-10-27 Zaid Khan , Vijay Kumar BG , Samuel Schulter , Manmohan Chandraker , Yun Fu

Cross-modal Knowledge Reasoning for Knowledge-based Visual Question Answering

Knowledge-based Visual Question Answering (KVQA) requires external knowledge beyond the visible content to answer questions about an image. This ability is challenging but indispensable to achieve general VQA. One limitation of existing…

Artificial Intelligence · Computer Science 2020-11-04 Jing Yu , Zihao Zhu , Yujing Wang , Weifeng Zhang , Yue Hu , Jianlong Tan

Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection

Knowledge-Based Visual Question Answering (KBVQA) is a bi-modal task requiring external world knowledge in order to correctly answer a text question and associated image. Recent single modality text work has shown knowledge injection into…

Computation and Language · Computer Science 2022-05-30 Diego Garcia-Olano , Yasumasa Onoe , Joydeep Ghosh

FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering

Knowledge base question answering (KBQA) is a critical yet challenging task due to the vast number of entities within knowledge bases and the diversity of natural language questions posed by users. Unfortunately, the performance of most…

Computation and Language · Computer Science 2024-01-29 Zhenyu Li , Sunqi Fan , Yu Gu , Xiuxing Li , Zhichao Duan , Bowen Dong , Ning Liu , Jianyong Wang

Knowledge Detection by Relevant Question and Image Attributes in Visual Question Answering

Visual question answering (VQA) is a Multidisciplinary research problem that pursued through practices of natural language processing and computer vision. Visual question answering automatically answers natural language questions according…

Computer Vision and Pattern Recognition · Computer Science 2024-09-01 Param Ahir , Hiteishi Diwanji

A Knowledge-Injected Curriculum Pretraining Framework for Question Answering

Knowledge-based question answering (KBQA) is a key task in NLP research, and also an approach to access the web data and knowledge, which requires exploiting knowledge graphs (KGs) for reasoning. In the literature, one promising solution…

Computation and Language · Computer Science 2024-03-18 Xin Lin , Tianhuang Su , Zhenya Huang , Shangzi Xue , Haifeng Liu , Enhong Chen

Visual Question Answering: A Survey of Methods and Datasets

Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. Given an image and a question in natural language, it requires…

Computer Vision and Pattern Recognition · Computer Science 2016-07-21 Qi Wu , Damien Teney , Peng Wang , Chunhua Shen , Anthony Dick , Anton van den Hengel