Related papers: Knowledge Generation for Zero-shot Knowledge-based…

Combining Knowledge Graph and LLMs for Enhanced Zero-shot Visual Question Answering

Zero-shot visual question answering (ZS-VQA), an emerged critical research area, intends to answer visual questions without providing training samples. Existing research in ZS-VQA has proposed to leverage knowledge graphs or large language…

Computer Vision and Pattern Recognition · Computer Science 2025-01-23 Qian Tao , Xiaoyang Fan , Yong Xu , Xingquan Zhu , Yufei Tang

Zero-shot Visual Question Answering using Knowledge Graph

Incorporating external knowledge to Visual Question Answering (VQA) has become a vital practical need. Existing methods mostly adopt pipeline approaches with different components for knowledge matching and extraction, feature learning,…

Artificial Intelligence · Computer Science 2021-10-19 Zhuo Chen , Jiaoyan Chen , Yuxia Geng , Jeff Z. Pan , Zonggang Yuan , Huajun Chen

GC-KBVQA: A New Four-Stage Framework for Enhancing Knowledge Based Visual Question Answering Performance

Knowledge-Based Visual Question Answering (KB-VQA) methods focus on tasks that demand reasoning with information extending beyond the explicit content depicted in the image. Early methods relied on explicit knowledge bases to provide this…

Computation and Language · Computer Science 2025-05-27 Mohammad Mahdi Moradi , Sudhir Mudur

Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts

Zero-shot Visual Question Answering (VQA) is a prominent vision-language task that examines both the visual and textual understanding capability of systems in the absence of training data. Recently, by converting the images into captions,…

Computer Vision and Pattern Recognition · Computer Science 2023-11-16 Yunshi Lan , Xiang Li , Xin Liu , Yang Li , Wei Qin , Weining Qian

K-VQG: Knowledge-aware Visual Question Generation for Common-sense Acquisition

Visual Question Generation (VQG) is a task to generate questions from images. When humans ask questions about an image, their goal is often to acquire some new knowledge. However, existing studies on VQG have mainly addressed question…

Computer Vision and Pattern Recognition · Computer Science 2022-03-16 Kohei Uehara , Tatsuya Harada

Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models

Knowledge-based Visual Question Answering (KVQA) requires both image and world knowledge to answer questions. Current methods first retrieve knowledge from the image and external knowledge base with the original complex question, then…

Computer Vision and Pattern Recognition · Computer Science 2024-07-23 Wenbin An , Feng Tian , Jiahao Nie , Wenkai Shi , Haonan Lin , Yan Chen , QianYing Wang , Yaqiang Wu , Guang Dai , Ping Chen

Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answering

Large Language Models (LLMs) are capable of performing zero-shot closed-book question answering tasks, based on their internal knowledge stored in parameters during pre-training. However, such internalized knowledge might be insufficient…

Computation and Language · Computer Science 2023-06-08 Jinheon Baek , Alham Fikri Aji , Amir Saffari

A Comprehensive Survey of Knowledge-Based Vision Question Answering Systems: The Lifecycle of Knowledge in Visual Reasoning Task

Knowledge-based Vision Question Answering (KB-VQA) extends general Vision Question Answering (VQA) by not only requiring the understanding of visual and textual inputs but also extensive range of knowledge, enabling significant advancements…

Computer Vision and Pattern Recognition · Computer Science 2025-04-25 Jiaqi Deng , Zonghan Wu , Huan Huo , Guandong Xu

Uncertainty-based Visual Question Answering: Estimating Semantic Inconsistency between Image and Knowledge Base

Knowledge-based visual question answering (KVQA) task aims to answer questions that require additional external knowledge as well as an understanding of images and questions. Recent studies on KVQA inject an external knowledge in a…

Computer Vision and Pattern Recognition · Computer Science 2022-07-28 Jinyeong Chae , Jihie Kim

A Simple Baseline for Knowledge-Based Visual Question Answering

This paper is on the problem of Knowledge-Based Visual Question Answering (KB-VQA). Recent works have emphasized the significance of incorporating both explicit (through external databases) and implicit (through LLMs) knowledge to answer…

Computer Vision and Pattern Recognition · Computer Science 2023-10-25 Alexandros Xenos , Themos Stafylakis , Ioannis Patras , Georgios Tzimiropoulos

LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection

Visual question answering (VQA) often requires an understanding of visual concepts and language semantics, which relies on external knowledge. Most existing methods exploit pre-trained language models or/and unstructured text, but the…

Computer Vision and Pattern Recognition · Computer Science 2022-11-29 Zhuo Chen , Yufeng Huang , Jiaoyan Chen , Yuxia Geng , Yin Fang , Jeff Pan , Ningyu Zhang , Wen Zhang

Find The Gap: Knowledge Base Reasoning For Visual Question Answering

We analyze knowledge-based visual question answering, for which given a question, the models need to ground it into the visual modality and retrieve the relevant knowledge from a given large knowledge base (KB) to be able to answer. Our…

Artificial Intelligence · Computer Science 2024-04-17 Elham J. Barezi , Parisa Kordjamshidi

Knowledge Detection by Relevant Question and Image Attributes in Visual Question Answering

Visual question answering (VQA) is a Multidisciplinary research problem that pursued through practices of natural language processing and computer vision. Visual question answering automatically answers natural language questions according…

Computer Vision and Pattern Recognition · Computer Science 2024-09-01 Param Ahir , Hiteishi Diwanji

Knowledge Condensation and Reasoning for Knowledge-based VQA

Knowledge-based visual question answering (KB-VQA) is a challenging task, which requires the model to leverage external knowledge for comprehending and answering questions grounded in visual content. Recent studies retrieve the knowledge…

Computer Vision and Pattern Recognition · Computer Science 2024-03-18 Dongze Hao , Jian Jia , Longteng Guo , Qunbo Wang , Te Yang , Yan Li , Yanhua Cheng , Bo Wang , Quan Chen , Han Li , Jing Liu

Modularized Zero-shot VQA with Pre-trained Models

Large-scale pre-trained models (PTMs) show great zero-shot capabilities. In this paper, we study how to leverage them for zero-shot visual question answering (VQA). Our approach is motivated by a few observations. First, VQA questions often…

Computer Vision and Pattern Recognition · Computer Science 2024-01-25 Rui Cao , Jing Jiang

Knowledge-aware Visual Question Generation for Remote Sensing Images

With the rapid development of remote sensing image archives, asking questions about images has become an effective way of gathering specific information or performing image retrieval. However, automatically generated image-based questions…

Computer Vision and Pattern Recognition · Computer Science 2026-02-24 Siran Li , Li Mi , Javiera Castillo-Navarro , Devis Tuia

Zero-Shot Visual Question Answering

Part of the appeal of Visual Question Answering (VQA) is its promise to answer new questions about previously unseen images. Most current methods demand training questions that illustrate every possible concept, and will therefore never…

Computer Vision and Pattern Recognition · Computer Science 2016-11-22 Damien Teney , Anton van den Hengel

Zero-Shot Question Generation from Knowledge Graphs for Unseen Predicates and Entity Types

We present a neural model for question generation from knowledge base triples in a "Zero-Shot" setup, that is generating questions for triples containing predicates, subject types or object types that were not seen at training time. Our…

Computation and Language · Computer Science 2018-02-21 Hady Elsahar , Christophe Gravier , Frederique Laforest

Knowledge-driven Data Construction for Zero-shot Evaluation in Commonsense Question Answering

Recent developments in pre-trained neural language modeling have led to leaps in accuracy on commonsense question-answering benchmarks. However, there is increasing concern that models overfit to specific tasks, without learning to utilize…

Computation and Language · Computer Science 2020-12-16 Kaixin Ma , Filip Ilievski , Jonathan Francis , Yonatan Bisk , Eric Nyberg , Alessandro Oltramari

Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts?

An ability to learn about new objects from a small amount of visual data and produce convincing linguistic justification about the presence/absence of certain concepts (that collectively compose the object) in novel scenarios is an…

Computer Vision and Pattern Recognition · Computer Science 2024-10-18 Shailaja Keyur Sampat , Maitreya Patel , Yezhou Yang , Chitta Baral