Related papers: PlotQA: Reasoning over Scientific Plots

FigureQA: An Annotated Figure Dataset for Visual Reasoning

We introduce FigureQA, a visual reasoning corpus of over one million question-answer pairs grounded in over 100,000 images. The images are synthetic, scientific-style figures from five classes: line plots, dot-line plots, vertical and…

Computer Vision and Pattern Recognition · Computer Science 2018-02-26 Samira Ebrahimi Kahou , Vincent Michalski , Adam Atkinson , Akos Kadar , Adam Trischler , Yoshua Bengio

A Comprehensive Survey on Visual Question Answering Datasets and Algorithms

Visual question answering (VQA) refers to the problem where, given an image and a natural language question about the image, a correct natural language answer has to be generated. A VQA model has to demonstrate both the visual understanding…

Computer Vision and Pattern Recognition · Computer Science 2024-11-19 Raihan Kabir , Naznin Haque , Md Saiful Islam , Marium-E-Jannat

DCQA: Document-Level Chart Question Answering towards Complex Reasoning and Common-Sense Understanding

Visually-situated languages such as charts and plots are omnipresent in real-world documents. These graphical depictions are human-readable and are often analyzed in visually-rich documents to address a variety of questions that necessitate…

Artificial Intelligence · Computer Science 2023-10-31 Anran Wu , Luwei Xiao , Xingjiao Wu , Shuwen Yang , Junjie Xu , Zisong Zhuang , Nian Xie , Cheng Jin , Liang He

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering

We introduce GQA, a new dataset for real-world visual reasoning and compositional question answering, seeking to address key shortcomings of previous VQA datasets. We have developed a strong and robust question engine that leverages scene…

Computation and Language · Computer Science 2019-07-12 Drew A. Hudson , Christopher D. Manning

An Analysis of Visual Question Answering Algorithms

In visual question answering (VQA), an algorithm must answer text-based questions about images. While multiple datasets for VQA have been created since late 2014, they all have flaws in both their content and the way algorithms are…

Computer Vision and Pattern Recognition · Computer Science 2017-09-15 Kushal Kafle , Christopher Kanan

SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images

Visual question answering on document images that contain textual, visual, and layout information, called document VQA, has received much attention recently. Although many datasets have been proposed for developing document VQA systems,…

Computation and Language · Computer Science 2023-01-13 Ryota Tanaka , Kyosuke Nishida , Kosuke Nishida , Taku Hasegawa , Itsumi Saito , Kuniko Saito

A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge

The Visual Question Answering (VQA) task aspires to provide a meaningful testbed for the development of AI models that can jointly reason over visual and natural language inputs. Despite a proliferation of VQA datasets, this goal is…

Computer Vision and Pattern Recognition · Computer Science 2022-06-06 Dustin Schwenk , Apoorv Khandelwal , Christopher Clark , Kenneth Marino , Roozbeh Mottaghi

ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning

Charts are very popular for analyzing data. When exploring charts, people often ask a variety of complex reasoning questions that involve several logical and arithmetic operations. They also commonly refer to visual features of a chart in…

Computation and Language · Computer Science 2022-03-22 Ahmed Masry , Do Xuan Long , Jia Qing Tan , Shafiq Joty , Enamul Hoque

FashionVQA: A Domain-Specific Visual Question Answering System

Humans apprehend the world through various sensory modalities, yet language is their predominant communication channel. Machine learning systems need to draw on the same multimodal richness to have informed discourses with humans in natural…

Computer Vision and Pattern Recognition · Computer Science 2022-08-25 Min Wang , Ata Mahjoubfar , Anupama Joshi

ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering

In this paper, we propose a new dataset, ReasonVQA, for the Visual Question Answering (VQA) task. Our dataset is automatically integrated with structured encyclopedic knowledge and constructed using a low-cost framework, which is capable of…

Computer Vision and Pattern Recognition · Computer Science 2026-02-03 Duong T. Tran , Trung-Kien Tran , Manfred Hauswirth , Danh Le Phuoc

NOAHQA: Numerical Reasoning with Interpretable Graph Question Answering Dataset

While diverse question answering (QA) datasets have been proposed and contributed significantly to the development of deep learning models for QA tasks, the existing datasets fall short in two aspects. First, we lack QA datasets covering…

Computation and Language · Computer Science 2021-10-15 Qiyuan Zhang , Lei Wang , Sicheng Yu , Shuohang Wang , Yang Wang , Jing Jiang , Ee-Peng Lim

DocVQA: A Dataset for VQA on Document Images

We present a new dataset for Visual Question Answering (VQA) on document images called DocVQA. The dataset consists of 50,000 questions defined on 12,000+ document images. Detailed analysis of the dataset in comparison with similar datasets…

Computer Vision and Pattern Recognition · Computer Science 2021-01-06 Minesh Mathew , Dimosthenis Karatzas , C. V. Jawahar

ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding

Visual question answering is an important task in both natural language and vision understanding. However, in most of the public visual question answering datasets such as VQA, CLEVR, the questions are human generated that specific to the…

Computation and Language · Computer Science 2022-08-08 Bingning Wang , Feiyang Lv , Ting Yao , Yiming Yuan , Jin Ma , Yu Luo , Haijin Liang

A survey on VQA_Datasets and Approaches

Visual question answering (VQA) is a task that combines both the techniques of computer vision and natural language processing. It requires models to answer a text-based question according to the information contained in a visual. In recent…

Computer Vision and Pattern Recognition · Computer Science 2021-05-04 Yeyun Zou , Qiyu Xie

VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning

The ideal form of Visual Question Answering requires understanding, grounding and reasoning in the joint space of vision and language and serves as a proxy for the AI task of scene understanding. However, most existing VQA benchmarks are…

Computer Vision and Pattern Recognition · Computer Science 2023-03-07 Kang Chen , Xiangqian Wu

IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning

Current visual question answering (VQA) tasks mainly consider answering human-annotated questions for natural images. However, aside from natural images, abstract diagrams with semantic richness are still understudied in visual…

Computer Vision and Pattern Recognition · Computer Science 2022-07-26 Pan Lu , Liang Qiu , Jiaqi Chen , Tony Xia , Yizhou Zhao , Wei Zhang , Zhou Yu , Xiaodan Liang , Song-Chun Zhu

FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts

Existing benchmarks for visual question answering lack in visual grounding and complexity, particularly in evaluating spatial reasoning skills. We introduce FlowVQA, a novel benchmark aimed at assessing the capabilities of visual…

Computation and Language · Computer Science 2024-07-01 Shubhankar Singh , Purvi Chaurasia , Yerram Varun , Pranshu Pandya , Vatsal Gupta , Vivek Gupta , Dan Roth

SimpsonsVQA: Enhancing Inquiry-Based Learning with a Tailored Dataset

Visual Question Answering (VQA) has emerged as a promising area of research to develop AI-based systems for enabling interactive and immersive learning. Numerous VQA datasets have been introduced to facilitate various tasks, such as…

Computer Vision and Pattern Recognition · Computer Science 2024-10-31 Ngoc Dung Huynh , Mohamed Reda Bouadjenek , Sunil Aryal , Imran Razzak , Hakim Hacid

BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution Generalization of VQA Models

We introduce a new test set for visual question answering (VQA) called BinaryVQA to push the limits of VQA models. Our dataset includes 7,800 questions across 1,024 images and covers a wide variety of objects, topics, and concepts. For easy…

Computer Vision and Pattern Recognition · Computer Science 2023-01-31 Ali Borji

OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

Visual Question Answering (VQA) in its ideal form lets us study reasoning in the joint space of vision and language and serves as a proxy for the AI task of scene understanding. However, most VQA benchmarks to date are focused on questions…

Computer Vision and Pattern Recognition · Computer Science 2019-09-05 Kenneth Marino , Mohammad Rastegari , Ali Farhadi , Roozbeh Mottaghi