Related papers: Object-based reasoning in VQA

A survey on VQA_Datasets and Approaches

Visual question answering (VQA) is a task that combines both the techniques of computer vision and natural language processing. It requires models to answer a text-based question according to the information contained in a visual. In recent…

Computer Vision and Pattern Recognition · Computer Science 2021-05-04 Yeyun Zou , Qiyu Xie

Visual Question Answering based on Formal Logic

Visual question answering (VQA) has been gaining a lot of traction in the machine learning community in the recent years due to the challenges posed in understanding information coming from multiple modalities (i.e., images, language). In…

Computer Vision and Pattern Recognition · Computer Science 2021-11-11 Muralikrishnna G. Sethuraman , Ali Payani , Faramarz Fekri , J. Clayton Kerce

Visual Question Answering as Reading Comprehension

Visual question answering (VQA) demands simultaneous comprehension of both the image visual content and natural language questions. In some cases, the reasoning needs the help of common sense or general knowledge which usually appear in the…

Computer Vision and Pattern Recognition · Computer Science 2018-11-30 Hui Li , Peng Wang , Chunhua Shen , Anton van den Hengel

Visual Question Answering using Deep Learning: A Survey and Performance Analysis

The Visual Question Answering (VQA) task combines challenges for processing data with both Visual and Linguistic processing, to answer basic `common sense' questions about given images. Given an image and a question in natural language, the…

Computer Vision and Pattern Recognition · Computer Science 2020-12-24 Yash Srivastava , Vaishnav Murali , Shiv Ram Dubey , Snehasis Mukherjee

Survey of Recent Advances in Visual Question Answering

Visual Question Answering (VQA) presents a unique challenge as it requires the ability to understand and encode the multi-modal inputs - in terms of image processing and natural language processing. The algorithm further needs to learn how…

Computer Vision and Pattern Recognition · Computer Science 2017-09-26 Supriya Pandhre , Shagun Sodhani

Visual question answering: from early developments to recent advances -- a survey

Visual Question Answering (VQA) is an evolving research field aimed at enabling machines to answer questions about visual content by integrating image and language processing techniques such as feature extraction, object detection, text…

Computer Vision and Pattern Recognition · Computer Science 2025-01-14 Ngoc Dung Huynh , Mohamed Reda Bouadjenek , Sunil Aryal , Imran Razzak , Hakim Hacid

A Comprehensive Survey on Visual Question Answering Datasets and Algorithms

Visual question answering (VQA) refers to the problem where, given an image and a natural language question about the image, a correct natural language answer has to be generated. A VQA model has to demonstrate both the visual understanding…

Computer Vision and Pattern Recognition · Computer Science 2024-11-19 Raihan Kabir , Naznin Haque , Md Saiful Islam , Marium-E-Jannat

Visual Question Answering: A Survey of Methods and Datasets

Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. Given an image and a question in natural language, it requires…

Computer Vision and Pattern Recognition · Computer Science 2016-07-21 Qi Wu , Damien Teney , Peng Wang , Chunhua Shen , Anthony Dick , Anton van den Hengel

Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey

Visual Question Answering (VQA) is a challenge task that combines natural language processing and computer vision techniques and gradually becomes a benchmark test task in multimodal large language models (MLLMs). The goal of our survey is…

Computation and Language · Computer Science 2024-11-27 Jiayi Kuang , Jingyou Xie , Haohao Luo , Ronghao Li , Zhe Xu , Xianfeng Cheng , Yinghui Li , Xika Lin , Ying Shen

The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering

Visual Question Answering (VQA) is an interdisciplinary field that bridges the gap between computer vision (CV) and natural language processing(NLP), enabling Artificial Intelligence(AI) systems to answer questions about images. Since its…

Computer Vision and Pattern Recognition · Computer Science 2025-01-14 Anupam Pandey , Deepjyoti Bodo , Arpan Phukan , Asif Ekbal

VQA: Visual Question Answering

We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios,…

Computation and Language · Computer Science 2016-10-28 Aishwarya Agrawal , Jiasen Lu , Stanislaw Antol , Margaret Mitchell , C. Lawrence Zitnick , Dhruv Batra , Devi Parikh

Visual Question Answering: Datasets, Algorithms, and Future Challenges

Visual Question Answering (VQA) is a recent problem in computer vision and natural language processing that has garnered a large amount of interest from the deep learning, computer vision, and natural language processing communities. In…

Computer Vision and Pattern Recognition · Computer Science 2017-06-16 Kushal Kafle , Christopher Kanan

Survey of Visual Question Answering: Datasets and Techniques

Visual question answering (or VQA) is a new and exciting problem that combines natural language processing and computer vision techniques. We present a survey of the various datasets and models that have been used to tackle this task. The…

Computation and Language · Computer Science 2017-05-12 Akshay Kumar Gupta

VoQA: Visual-only Question Answering

Visual understanding requires interpreting both natural scenes and the textual information that appears within them, motivating tasks such as Visual Question Answering (VQA). However, current VQA benchmarks overlook scenarios with visually…

Computer Vision and Pattern Recognition · Computer Science 2025-12-02 Jianing An , Luyang Jiang , Jie Luo , Wenjun Wu , Lei Huang

Visual Question Answering: which investigated applications?

Visual Question Answering (VQA) is an extremely stimulating and challenging research area where Computer Vision (CV) and Natural Language Processig (NLP) have recently met. In image captioning and video summarization, the semantic…

Computer Vision and Pattern Recognition · Computer Science 2021-03-09 Silvio Barra , Carmen Bisogni , Maria De Marsico , Stefano Ricciardi

Visuo-Linguistic Question Answering (VLQA) Challenge

Understanding images and text together is an important aspect of cognition and building advanced Artificial Intelligence (AI) systems. As a community, we have achieved good benchmarks over language and vision domains separately, however…

Computer Vision and Pattern Recognition · Computer Science 2020-11-19 Shailaja Keyur Sampat , Yezhou Yang , Chitta Baral

Faithful Multimodal Explanation for Visual Question Answering

AI systems' ability to explain their reasoning is critical to their utility and trustworthiness. Deep neural networks have enabled significant progress on many challenging problems such as visual question answering (VQA). However, most of…

Computation and Language · Computer Science 2019-06-05 Jialin Wu , Raymond J. Mooney

Component Analysis for Visual Question Answering Architectures

Recent research advances in Computer Vision and Natural Language Processing have introduced novel tasks that are paving the way for solving AI-complete problems. One of those tasks is called Visual Question Answering (VQA). A VQA system…

Computer Vision and Pattern Recognition · Computer Science 2020-07-30 Camila Kolling , Jônatas Wehrmann , Rodrigo C. Barros

OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

Visual Question Answering (VQA) in its ideal form lets us study reasoning in the joint space of vision and language and serves as a proxy for the AI task of scene understanding. However, most VQA benchmarks to date are focused on questions…

Computer Vision and Pattern Recognition · Computer Science 2019-09-05 Kenneth Marino , Mohammad Rastegari , Ali Farhadi , Roozbeh Mottaghi

A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge

The Visual Question Answering (VQA) task aspires to provide a meaningful testbed for the development of AI models that can jointly reason over visual and natural language inputs. Despite a proliferation of VQA datasets, this goal is…

Computer Vision and Pattern Recognition · Computer Science 2022-06-06 Dustin Schwenk , Apoorv Khandelwal , Christopher Clark , Kenneth Marino , Roozbeh Mottaghi