Related papers: Graph-Structured Representations for Visual Questi…

Learning Conditioned Graph Structures for Interpretable Visual Question Answering

Visual Question answering is a challenging problem requiring a combination of concepts from Computer Vision and Natural Language Processing. Most existing approaches use a two streams strategy, computing image and question features that are…

Computer Vision and Pattern Recognition · Computer Science 2018-11-02 Will Norcliffe-Brown , Efstathios Vafeias , Sarah Parisot

SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering

Visual Question Answering (VQA) attracts much attention from both industry and academia. As a multi-modality task, it is challenging since it requires not only visual and textual understanding, but also the ability to align cross-modality…

Computer Vision and Pattern Recognition · Computer Science 2022-01-27 Peixi Xiong , Quanzeng You , Pei Yu , Zicheng Liu , Ying Wu

Computed Tomography Visual Question Answering with Cross-modal Feature Graphing

Visual question answering (VQA) in medical imaging aims to support clinical diagnosis by automatically interpreting complex imaging data in response to natural language queries. Existing studies typically rely on distinct visual and textual…

Computer Vision and Pattern Recognition · Computer Science 2025-07-08 Yuanhe Tian , Chen Su , Junwen Duan , Yan Song

An Empirical Study on Leveraging Scene Graphs for Visual Question Answering

Visual question answering (Visual QA) has attracted significant attention these years. While a variety of algorithms have been proposed, most of them are built upon different combinations of image and language features as well as…

Computer Vision and Pattern Recognition · Computer Science 2019-07-30 Cheng Zhang , Wei-Lun Chao , Dong Xuan

VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering

Visual question answering (VQA) requires systems to perform concept-level reasoning by unifying unstructured (e.g., the context in question and answer; "QA context") and structured (e.g., knowledge graph for the QA context and scene;…

Computer Vision and Pattern Recognition · Computer Science 2023-09-18 Yanan Wang , Michihiro Yasunaga , Hongyu Ren , Shinya Wada , Jure Leskovec

Scene Graph Reasoning with Prior Visual Relationship for Visual Question Answering

One of the key issues of Visual Question Answering (VQA) is to reason with semantic clues in the visual content under the guidance of the question, how to model relational semantics still remains as a great challenge. To fully capture…

Multimedia · Computer Science 2019-08-22 Zhuoqian Yang , Zengchang Qin , Jing Yu , Yue Hu

Visual Question Answering: A Survey of Methods and Datasets

Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. Given an image and a question in natural language, it requires…

Computer Vision and Pattern Recognition · Computer Science 2016-07-21 Qi Wu , Damien Teney , Peng Wang , Chunhua Shen , Anthony Dick , Anton van den Hengel

Augmenting Visual Question Answering with Semantic Frame Information in a Multitask Learning Approach

Visual Question Answering (VQA) concerns providing answers to Natural Language questions about images. Several deep neural network approaches have been proposed to model the task in an end-to-end fashion. Whereas the task is grounded in…

Artificial Intelligence · Computer Science 2020-02-03 Mehrdad Alizadeh , Barbara Di Eugenio

GraghVQA: Language-Guided Graph Neural Networks for Graph-based Visual Question Answering

Images are more than a collection of objects or attributes -- they represent a web of relationships among interconnected objects. Scene Graph has emerged as a new modality for a structured graphical representation of images. Scene Graph…

Computation and Language · Computer Science 2021-06-03 Weixin Liang , Yanhao Jiang , Zixuan Liu

Understanding the Role of Scene Graphs in Visual Question Answering

Visual Question Answering (VQA) is of tremendous interest to the research community with important applications such as aiding visually impaired users and image-based search. In this work, we explore the use of scene graphs for solving the…

Computer Vision and Pattern Recognition · Computer Science 2021-01-19 Vinay Damodaran , Sharanya Chakravarthy , Akshay Kumar , Anjana Umapathy , Teruko Mitamura , Yuta Nakashima , Noa Garcia , Chenhui Chu

Visual Query Answering by Entity-Attribute Graph Matching and Reasoning

Visual Query Answering (VQA) is of great significance in offering people convenience: one can raise a question for details of objects, or high-level understanding about the scene, over an image. This paper proposes a novel method to address…

Computer Vision and Pattern Recognition · Computer Science 2019-03-19 Peixi Xiong , Huayi Zhan , Xin Wang , Baivab Sinha , Ying Wu

A Picture May Be Worth a Hundred Words for Visual Question Answering

How far can we go with textual representations for understanding pictures? In image understanding, it is essential to use concise but detailed image representations. Deep visual features extracted by vision models, such as Faster R-CNN, are…

Computer Vision and Pattern Recognition · Computer Science 2021-06-28 Yusuke Hirota , Noa Garcia , Mayu Otani , Chenhui Chu , Yuta Nakashima , Ittetsu Taniguchi , Takao Onoye

Visual Question Answering based on Formal Logic

Visual question answering (VQA) has been gaining a lot of traction in the machine learning community in the recent years due to the challenges posed in understanding information coming from multiple modalities (i.e., images, language). In…

Computer Vision and Pattern Recognition · Computer Science 2021-11-11 Muralikrishnna G. Sethuraman , Ali Payani , Faramarz Fekri , J. Clayton Kerce

Open-Ended Visual Question-Answering

This thesis report studies methods to solve Visual Question-Answering (VQA) tasks with a Deep Learning framework. As a preliminary step, we explore Long Short-Term Memory (LSTM) networks used in Natural Language Processing (NLP) to tackle…

Computation and Language · Computer Science 2016-10-11 Issey Masuda , Santiago Pascual de la Puente , Xavier Giro-i-Nieto

Improving Visual Question Answering by Referring to Generated Paragraph Captions

Paragraph-style image captions describe diverse aspects of an image as opposed to the more common single-sentence captions that only provide an abstract description of the image. These paragraph captions can hence contain substantial…

Computation and Language · Computer Science 2019-06-17 Hyounghun Kim , Mohit Bansal

Visual Graph Question Answering with ASP and LLMs for Language Parsing

Visual Question Answering (VQA) is a challenging problem that requires to process multimodal input. Answer-Set Programming (ASP) has shown great potential in this regard to add interpretability and explainability to modular VQA…

Artificial Intelligence · Computer Science 2025-02-14 Jakob Johannes Bauer , Thomas Eiter , Nelson Higuera Ruiz , Johannes Oetsch

Cross-modal Knowledge Reasoning for Knowledge-based Visual Question Answering

Knowledge-based Visual Question Answering (KVQA) requires external knowledge beyond the visible content to answer questions about an image. This ability is challenging but indispensable to achieve general VQA. One limitation of existing…

Artificial Intelligence · Computer Science 2020-11-04 Jing Yu , Zihao Zhu , Yujing Wang , Weifeng Zhang , Yue Hu , Jianlong Tan

Visual Question Answering based on Local-Scene-Aware Referring Expression Generation

Visual question answering requires a deep understanding of both images and natural language. However, most methods mainly focus on visual concept; such as the relationships between various objects. The limited use of object categories…

Computer Vision and Pattern Recognition · Computer Science 2021-01-25 Jung-Jun Kim , Dong-Gyu Lee , Jialin Wu , Hong-Gyu Jung , Seong-Whan Lee

Visual Question Answering using Deep Learning: A Survey and Performance Analysis

The Visual Question Answering (VQA) task combines challenges for processing data with both Visual and Linguistic processing, to answer basic `common sense' questions about given images. Given an image and a question in natural language, the…

Computer Vision and Pattern Recognition · Computer Science 2020-12-24 Yash Srivastava , Vaishnav Murali , Shiv Ram Dubey , Snehasis Mukherjee

SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based Question Answering

The intersection of vision and language is of major interest due to the increased focus on seamless integration between recognition and reasoning. Scene graphs (SGs) have emerged as a useful tool for multimodal image analysis, showing…

Computer Vision and Pattern Recognition · Computer Science 2023-10-04 Bruno Souza , Marius Aasan , Helio Pedrini , Adín Ramírez Rivera