Related papers: Learning Conditioned Graph Structures for Interpre…

Graph-Structured Representations for Visual Question Answering

This paper proposes to improve visual question answering (VQA) with structured representations of both scene contents and questions. A key challenge in VQA is to require joint reasoning over the visual and text domains. The predominant…

Computer Vision and Pattern Recognition · Computer Science 2017-03-31 Damien Teney , Lingqiao Liu , Anton van den Hengel

Visual Graph Question Answering with ASP and LLMs for Language Parsing

Visual Question Answering (VQA) is a challenging problem that requires to process multimodal input. Answer-Set Programming (ASP) has shown great potential in this regard to add interpretability and explainability to modular VQA…

Artificial Intelligence · Computer Science 2025-02-14 Jakob Johannes Bauer , Thomas Eiter , Nelson Higuera Ruiz , Johannes Oetsch

An Empirical Study on Leveraging Scene Graphs for Visual Question Answering

Visual question answering (Visual QA) has attracted significant attention these years. While a variety of algorithms have been proposed, most of them are built upon different combinations of image and language features as well as…

Computer Vision and Pattern Recognition · Computer Science 2019-07-30 Cheng Zhang , Wei-Lun Chao , Dong Xuan

Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning

Medical visual question answering (VQA) aims to answer clinically relevant questions regarding input medical images. This technique has the potential to improve the efficiency of medical professionals while relieving the burden on the…

Computer Vision and Pattern Recognition · Computer Science 2023-02-21 Xinyue Hu , Lin Gu , Kazuma Kobayashi , Qiyuan An , Qingyu Chen , Zhiyong Lu , Chang Su , Tatsuya Harada , Yingying Zhu

Visual Question Answering: A Survey of Methods and Datasets

Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. Given an image and a question in natural language, it requires…

Computer Vision and Pattern Recognition · Computer Science 2016-07-21 Qi Wu , Damien Teney , Peng Wang , Chunhua Shen , Anthony Dick , Anton van den Hengel

Visual Question Answering using Deep Learning: A Survey and Performance Analysis

The Visual Question Answering (VQA) task combines challenges for processing data with both Visual and Linguistic processing, to answer basic `common sense' questions about given images. Given an image and a question in natural language, the…

Computer Vision and Pattern Recognition · Computer Science 2020-12-24 Yash Srivastava , Vaishnav Murali , Shiv Ram Dubey , Snehasis Mukherjee

Cross-modal Knowledge Reasoning for Knowledge-based Visual Question Answering

Knowledge-based Visual Question Answering (KVQA) requires external knowledge beyond the visible content to answer questions about an image. This ability is challenging but indispensable to achieve general VQA. One limitation of existing…

Artificial Intelligence · Computer Science 2020-11-04 Jing Yu , Zihao Zhu , Yujing Wang , Weifeng Zhang , Yue Hu , Jianlong Tan

Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering

Fact-based Visual Question Answering (FVQA) requires external knowledge beyond visible content to answer questions about an image, which is challenging but indispensable to achieve general VQA. One limitation of existing FVQA solutions is…

Computer Vision and Pattern Recognition · Computer Science 2020-11-05 Zihao Zhu , Jing Yu , Yujing Wang , Yajing Sun , Yue Hu , Qi Wu

SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering

Visual Question Answering (VQA) attracts much attention from both industry and academia. As a multi-modality task, it is challenging since it requires not only visual and textual understanding, but also the ability to align cross-modality…

Computer Vision and Pattern Recognition · Computer Science 2022-01-27 Peixi Xiong , Quanzeng You , Pei Yu , Zicheng Liu , Ying Wu

Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering

This paper presents a novel method, termed Bridge to Answer, to infer correct answers for questions about a given video by leveraging adequate graph interactions of heterogeneous crossmodal graphs. To realize this, we learn question…

Computer Vision and Pattern Recognition · Computer Science 2021-04-30 Jungin Park , Jiyoung Lee , Kwanghoon Sohn

Scene Graph Reasoning for Visual Question Answering

Visual question answering is concerned with answering free-form questions about an image. Since it requires a deep linguistic understanding of the question and the ability to associate it with various objects that are present in the image,…

Machine Learning · Computer Science 2020-07-03 Marcel Hildebrandt , Hang Li , Rajat Koner , Volker Tresp , Stephan Günnemann

Learning Situation Hyper-Graphs for Video Question Answering

Answering questions about complex situations in videos requires not only capturing the presence of actors, objects, and their relations but also the evolution of these relationships over time. A situation hyper-graph is a representation…

Computer Vision and Pattern Recognition · Computer Science 2023-05-09 Aisha Urooj Khan , Hilde Kuehne , Bo Wu , Kim Chheu , Walid Bousselham , Chuang Gan , Niels Lobo , Mubarak Shah

Understanding the Role of Scene Graphs in Visual Question Answering

Visual Question Answering (VQA) is of tremendous interest to the research community with important applications such as aiding visually impaired users and image-based search. In this work, we explore the use of scene graphs for solving the…

Computer Vision and Pattern Recognition · Computer Science 2021-01-19 Vinay Damodaran , Sharanya Chakravarthy , Akshay Kumar , Anjana Umapathy , Teruko Mitamura , Yuta Nakashima , Noa Garcia , Chenhui Chu

Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering

Answering semantically-complicated questions according to an image is challenging in Visual Question Answering (VQA) task. Although the image can be well represented by deep learning, the question is always simply embedded and cannot well…

Computer Vision and Pattern Recognition · Computer Science 2021-12-15 JianJian Cao , Xiameng Qin , Sanyuan Zhao , Jianbing Shen

GraghVQA: Language-Guided Graph Neural Networks for Graph-based Visual Question Answering

Images are more than a collection of objects or attributes -- they represent a web of relationships among interconnected objects. Scene Graph has emerged as a new modality for a structured graphical representation of images. Scene Graph…

Computation and Language · Computer Science 2021-06-03 Weixin Liang , Yanhao Jiang , Zixuan Liu

Visual Query Answering by Entity-Attribute Graph Matching and Reasoning

Visual Query Answering (VQA) is of great significance in offering people convenience: one can raise a question for details of objects, or high-level understanding about the scene, over an image. This paper proposes a novel method to address…

Computer Vision and Pattern Recognition · Computer Science 2019-03-19 Peixi Xiong , Huayi Zhan , Xin Wang , Baivab Sinha , Ying Wu

Scene Graph Reasoning with Prior Visual Relationship for Visual Question Answering

One of the key issues of Visual Question Answering (VQA) is to reason with semantic clues in the visual content under the guidance of the question, how to model relational semantics still remains as a great challenge. To fully capture…

Multimedia · Computer Science 2019-08-22 Zhuoqian Yang , Zengchang Qin , Jing Yu , Yue Hu

Visual Question Answering based on Formal Logic

Visual question answering (VQA) has been gaining a lot of traction in the machine learning community in the recent years due to the challenges posed in understanding information coming from multiple modalities (i.e., images, language). In…

Computer Vision and Pattern Recognition · Computer Science 2021-11-11 Muralikrishnna G. Sethuraman , Ali Payani , Faramarz Fekri , J. Clayton Kerce

Visual Question Answering as Reading Comprehension

Visual question answering (VQA) demands simultaneous comprehension of both the image visual content and natural language questions. In some cases, the reasoning needs the help of common sense or general knowledge which usually appear in the…

Computer Vision and Pattern Recognition · Computer Science 2018-11-30 Hui Li , Peng Wang , Chunhua Shen , Anton van den Hengel

VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering

Visual question answering (VQA) requires systems to perform concept-level reasoning by unifying unstructured (e.g., the context in question and answer; "QA context") and structured (e.g., knowledge graph for the QA context and scene;…

Computer Vision and Pattern Recognition · Computer Science 2023-09-18 Yanan Wang , Michihiro Yasunaga , Hongyu Ren , Shinya Wada , Jure Leskovec