English
Related papers

Related papers: Graph-Structured Representations for Visual Questi…

200 papers

Visual Question answering is a challenging problem requiring a combination of concepts from Computer Vision and Natural Language Processing. Most existing approaches use a two streams strategy, computing image and question features that are…

Computer Vision and Pattern Recognition · Computer Science 2018-11-02 Will Norcliffe-Brown , Efstathios Vafeias , Sarah Parisot

Visual Question Answering (VQA) attracts much attention from both industry and academia. As a multi-modality task, it is challenging since it requires not only visual and textual understanding, but also the ability to align cross-modality…

Computer Vision and Pattern Recognition · Computer Science 2022-01-27 Peixi Xiong , Quanzeng You , Pei Yu , Zicheng Liu , Ying Wu

Visual question answering (VQA) in medical imaging aims to support clinical diagnosis by automatically interpreting complex imaging data in response to natural language queries. Existing studies typically rely on distinct visual and textual…

Computer Vision and Pattern Recognition · Computer Science 2025-07-08 Yuanhe Tian , Chen Su , Junwen Duan , Yan Song

Visual question answering (Visual QA) has attracted significant attention these years. While a variety of algorithms have been proposed, most of them are built upon different combinations of image and language features as well as…

Computer Vision and Pattern Recognition · Computer Science 2019-07-30 Cheng Zhang , Wei-Lun Chao , Dong Xuan

Visual question answering (VQA) requires systems to perform concept-level reasoning by unifying unstructured (e.g., the context in question and answer; "QA context") and structured (e.g., knowledge graph for the QA context and scene;…

Computer Vision and Pattern Recognition · Computer Science 2023-09-18 Yanan Wang , Michihiro Yasunaga , Hongyu Ren , Shinya Wada , Jure Leskovec

One of the key issues of Visual Question Answering (VQA) is to reason with semantic clues in the visual content under the guidance of the question, how to model relational semantics still remains as a great challenge. To fully capture…

Multimedia · Computer Science 2019-08-22 Zhuoqian Yang , Zengchang Qin , Jing Yu , Yue Hu

Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. Given an image and a question in natural language, it requires…

Computer Vision and Pattern Recognition · Computer Science 2016-07-21 Qi Wu , Damien Teney , Peng Wang , Chunhua Shen , Anthony Dick , Anton van den Hengel

Visual Question Answering (VQA) concerns providing answers to Natural Language questions about images. Several deep neural network approaches have been proposed to model the task in an end-to-end fashion. Whereas the task is grounded in…

Artificial Intelligence · Computer Science 2020-02-03 Mehrdad Alizadeh , Barbara Di Eugenio

Images are more than a collection of objects or attributes -- they represent a web of relationships among interconnected objects. Scene Graph has emerged as a new modality for a structured graphical representation of images. Scene Graph…

Computation and Language · Computer Science 2021-06-03 Weixin Liang , Yanhao Jiang , Zixuan Liu

Visual Question Answering (VQA) is of tremendous interest to the research community with important applications such as aiding visually impaired users and image-based search. In this work, we explore the use of scene graphs for solving the…

Computer Vision and Pattern Recognition · Computer Science 2021-01-19 Vinay Damodaran , Sharanya Chakravarthy , Akshay Kumar , Anjana Umapathy , Teruko Mitamura , Yuta Nakashima , Noa Garcia , Chenhui Chu

Visual Query Answering (VQA) is of great significance in offering people convenience: one can raise a question for details of objects, or high-level understanding about the scene, over an image. This paper proposes a novel method to address…

Computer Vision and Pattern Recognition · Computer Science 2019-03-19 Peixi Xiong , Huayi Zhan , Xin Wang , Baivab Sinha , Ying Wu

How far can we go with textual representations for understanding pictures? In image understanding, it is essential to use concise but detailed image representations. Deep visual features extracted by vision models, such as Faster R-CNN, are…

Computer Vision and Pattern Recognition · Computer Science 2021-06-28 Yusuke Hirota , Noa Garcia , Mayu Otani , Chenhui Chu , Yuta Nakashima , Ittetsu Taniguchi , Takao Onoye

Visual question answering (VQA) has been gaining a lot of traction in the machine learning community in the recent years due to the challenges posed in understanding information coming from multiple modalities (i.e., images, language). In…

Computer Vision and Pattern Recognition · Computer Science 2021-11-11 Muralikrishnna G. Sethuraman , Ali Payani , Faramarz Fekri , J. Clayton Kerce

This thesis report studies methods to solve Visual Question-Answering (VQA) tasks with a Deep Learning framework. As a preliminary step, we explore Long Short-Term Memory (LSTM) networks used in Natural Language Processing (NLP) to tackle…

Computation and Language · Computer Science 2016-10-11 Issey Masuda , Santiago Pascual de la Puente , Xavier Giro-i-Nieto

Paragraph-style image captions describe diverse aspects of an image as opposed to the more common single-sentence captions that only provide an abstract description of the image. These paragraph captions can hence contain substantial…

Computation and Language · Computer Science 2019-06-17 Hyounghun Kim , Mohit Bansal

Visual Question Answering (VQA) is a challenging problem that requires to process multimodal input. Answer-Set Programming (ASP) has shown great potential in this regard to add interpretability and explainability to modular VQA…

Artificial Intelligence · Computer Science 2025-02-14 Jakob Johannes Bauer , Thomas Eiter , Nelson Higuera Ruiz , Johannes Oetsch

Knowledge-based Visual Question Answering (KVQA) requires external knowledge beyond the visible content to answer questions about an image. This ability is challenging but indispensable to achieve general VQA. One limitation of existing…

Artificial Intelligence · Computer Science 2020-11-04 Jing Yu , Zihao Zhu , Yujing Wang , Weifeng Zhang , Yue Hu , Jianlong Tan

Visual question answering requires a deep understanding of both images and natural language. However, most methods mainly focus on visual concept; such as the relationships between various objects. The limited use of object categories…

Computer Vision and Pattern Recognition · Computer Science 2021-01-25 Jung-Jun Kim , Dong-Gyu Lee , Jialin Wu , Hong-Gyu Jung , Seong-Whan Lee

The Visual Question Answering (VQA) task combines challenges for processing data with both Visual and Linguistic processing, to answer basic `common sense' questions about given images. Given an image and a question in natural language, the…

Computer Vision and Pattern Recognition · Computer Science 2020-12-24 Yash Srivastava , Vaishnav Murali , Shiv Ram Dubey , Snehasis Mukherjee

The intersection of vision and language is of major interest due to the increased focus on seamless integration between recognition and reasoning. Scene graphs (SGs) have emerged as a useful tool for multimodal image analysis, showing…

Computer Vision and Pattern Recognition · Computer Science 2023-10-04 Bruno Souza , Marius Aasan , Helio Pedrini , Adín Ramírez Rivera
‹ Prev 1 2 3 10 Next ›