Related papers: Zero-shot Visual Question Answering using Knowledg…

Combining Knowledge Graph and LLMs for Enhanced Zero-shot Visual Question Answering

Zero-shot visual question answering (ZS-VQA), an emerged critical research area, intends to answer visual questions without providing training samples. Existing research in ZS-VQA has proposed to leverage knowledge graphs or large language…

Computer Vision and Pattern Recognition · Computer Science 2025-01-23 Qian Tao , Xiaoyang Fan , Yong Xu , Xingquan Zhu , Yufei Tang

Zero-Shot Visual Question Answering

Part of the appeal of Visual Question Answering (VQA) is its promise to answer new questions about previously unseen images. Most current methods demand training questions that illustrate every possible concept, and will therefore never…

Computer Vision and Pattern Recognition · Computer Science 2016-11-22 Damien Teney , Anton van den Hengel

Knowledge Generation for Zero-shot Knowledge-based VQA

Previous solutions to knowledge-based visual question answering~(K-VQA) retrieve knowledge from external knowledge bases and use supervised learning to train the K-VQA model. Recently pre-trained LLMs have been used as both a knowledge…

Computation and Language · Computer Science 2024-02-07 Rui Cao , Jing Jiang

VQA with no questions-answers training

Methods for teaching machines to answer visual questions have made significant progress in recent years, but current methods still lack important human capabilities, including integrating new visual classes and concepts in a modular manner,…

Computer Vision and Pattern Recognition · Computer Science 2020-05-27 Ben-Zion Vatashsky , Shimon Ullman

Exploring Question Decomposition for Zero-Shot VQA

Visual question answering (VQA) has traditionally been treated as a single-step task where each question receives the same amount of effort, unlike natural human question-answering strategies. We explore a question decomposition strategy…

Computer Vision and Pattern Recognition · Computer Science 2023-10-27 Zaid Khan , Vijay Kumar BG , Samuel Schulter , Manmohan Chandraker , Yun Fu

A Simple Baseline for Knowledge-Based Visual Question Answering

This paper is on the problem of Knowledge-Based Visual Question Answering (KB-VQA). Recent works have emphasized the significance of incorporating both explicit (through external databases) and implicit (through LLMs) knowledge to answer…

Computer Vision and Pattern Recognition · Computer Science 2023-10-25 Alexandros Xenos , Themos Stafylakis , Ioannis Patras , Georgios Tzimiropoulos

Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts

Visual question answering (VQA) is known as an AI-complete task as it requires understanding, reasoning, and inferring about the vision and the language content. Over the past few years, numerous neural architectures have been suggested for…

Computer Vision and Pattern Recognition · Computer Science 2024-04-15 Övgü Özdemir , Erdem Akagündüz

Cross-modal Knowledge Reasoning for Knowledge-based Visual Question Answering

Knowledge-based Visual Question Answering (KVQA) requires external knowledge beyond the visible content to answer questions about an image. This ability is challenging but indispensable to achieve general VQA. One limitation of existing…

Artificial Intelligence · Computer Science 2020-11-04 Jing Yu , Zihao Zhu , Yujing Wang , Weifeng Zhang , Yue Hu , Jianlong Tan

Seeing is Knowing! Fact-based Visual Question Answering using Knowledge Graph Embeddings

Fact-based Visual Question Answering (FVQA), a challenging variant of VQA, requires a QA-system to include facts from a diverse knowledge graph (KG) in its reasoning process to produce an answer. Large KGs, especially common-sense KGs, are…

Computation and Language · Computer Science 2021-06-22 Kiran Ramnath , Mark Hasegawa-Johnson

ZeShot-VQA: Zero-Shot Visual Question Answering Framework with Answer Mapping for Natural Disaster Damage Assessment

Natural disasters usually affect vast areas and devastate infrastructures. Performing a timely and efficient response is crucial to minimize the impact on affected communities, and data-driven approaches are the best choice. Visual question…

Computer Vision and Pattern Recognition · Computer Science 2025-12-10 Ehsan Karimi , Maryam Rahnemoonfar

Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts

Zero-shot Visual Question Answering (VQA) is a prominent vision-language task that examines both the visual and textual understanding capability of systems in the absence of training data. Recently, by converting the images into captions,…

Computer Vision and Pattern Recognition · Computer Science 2023-11-16 Yunshi Lan , Xiang Li , Xin Liu , Yang Li , Wei Qin , Weining Qian

Uncertainty-based Visual Question Answering: Estimating Semantic Inconsistency between Image and Knowledge Base

Knowledge-based visual question answering (KVQA) task aims to answer questions that require additional external knowledge as well as an understanding of images and questions. Recent studies on KVQA inject an external knowledge in a…

Computer Vision and Pattern Recognition · Computer Science 2022-07-28 Jinyeong Chae , Jihie Kim

Learning Conditioned Graph Structures for Interpretable Visual Question Answering

Visual Question answering is a challenging problem requiring a combination of concepts from Computer Vision and Natural Language Processing. Most existing approaches use a two streams strategy, computing image and question features that are…

Computer Vision and Pattern Recognition · Computer Science 2018-11-02 Will Norcliffe-Brown , Efstathios Vafeias , Sarah Parisot

Visual Question Answering: A Survey of Methods and Datasets

Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. Given an image and a question in natural language, it requires…

Computer Vision and Pattern Recognition · Computer Science 2016-07-21 Qi Wu , Damien Teney , Peng Wang , Chunhua Shen , Anthony Dick , Anton van den Hengel

Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts?

An ability to learn about new objects from a small amount of visual data and produce convincing linguistic justification about the presence/absence of certain concepts (that collectively compose the object) in novel scenarios is an…

Computer Vision and Pattern Recognition · Computer Science 2024-10-18 Shailaja Keyur Sampat , Maitreya Patel , Yezhou Yang , Chitta Baral

OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

Visual Question Answering (VQA) in its ideal form lets us study reasoning in the joint space of vision and language and serves as a proxy for the AI task of scene understanding. However, most VQA benchmarks to date are focused on questions…

Computer Vision and Pattern Recognition · Computer Science 2019-09-05 Kenneth Marino , Mohammad Rastegari , Ali Farhadi , Roozbeh Mottaghi

Modularized Zero-shot VQA with Pre-trained Models

Large-scale pre-trained models (PTMs) show great zero-shot capabilities. In this paper, we study how to leverage them for zero-shot visual question answering (VQA). Our approach is motivated by a few observations. First, VQA questions often…

Computer Vision and Pattern Recognition · Computer Science 2024-01-25 Rui Cao , Jing Jiang

Knowledge Detection by Relevant Question and Image Attributes in Visual Question Answering

Visual question answering (VQA) is a Multidisciplinary research problem that pursued through practices of natural language processing and computer vision. Visual question answering automatically answers natural language questions according…

Computer Vision and Pattern Recognition · Computer Science 2024-09-01 Param Ahir , Hiteishi Diwanji

Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding

Though beneficial for encouraging the Visual Question Answering (VQA) models to discover the underlying knowledge by exploiting the input-output correlation beyond image and text contexts, the existing knowledge VQA datasets are mostly…

Computer Vision and Pattern Recognition · Computer Science 2020-12-15 Qingxing Cao , Bailin Li , Xiaodan Liang , Keze Wang , Liang Lin

Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning

Medical visual question answering (VQA) aims to answer clinically relevant questions regarding input medical images. This technique has the potential to improve the efficiency of medical professionals while relieving the burden on the…

Computer Vision and Pattern Recognition · Computer Science 2023-02-21 Xinyue Hu , Lin Gu , Kazuma Kobayashi , Qiyuan An , Qingyu Chen , Zhiyong Lu , Chang Su , Tatsuya Harada , Yingying Zhu