Related papers: InfographicVQA

A survey on VQA_Datasets and Approaches

Visual question answering (VQA) is a task that combines both the techniques of computer vision and natural language processing. It requires models to answer a text-based question according to the information contained in a visual. In recent…

Computer Vision and Pattern Recognition · Computer Science 2021-05-04 Yeyun Zou , Qiyu Xie

A Comprehensive Survey on Visual Question Answering Datasets and Algorithms

Visual question answering (VQA) refers to the problem where, given an image and a natural language question about the image, a correct natural language answer has to be generated. A VQA model has to demonstrate both the visual understanding…

Computer Vision and Pattern Recognition · Computer Science 2024-11-19 Raihan Kabir , Naznin Haque , Md Saiful Islam , Marium-E-Jannat

Visual Question Answering: A Survey of Methods and Datasets

Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. Given an image and a question in natural language, it requires…

Computer Vision and Pattern Recognition · Computer Science 2016-07-21 Qi Wu , Damien Teney , Peng Wang , Chunhua Shen , Anthony Dick , Anton van den Hengel

VQA: Visual Question Answering

We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios,…

Computation and Language · Computer Science 2016-10-28 Aishwarya Agrawal , Jiasen Lu , Stanislaw Antol , Margaret Mitchell , C. Lawrence Zitnick , Dhruv Batra , Devi Parikh

PDFVQA: A New Dataset for Real-World VQA on PDF Documents

Document-based Visual Question Answering examines the document understanding of document images in conditions of natural language questions. We proposed a new document-based VQA dataset, PDF-VQA, to comprehensively examine the document…

Computer Vision and Pattern Recognition · Computer Science 2023-06-07 Yihao Ding , Siwen Luo , Hyunsuk Chung , Soyeon Caren Han

ViInfographicVQA: A Benchmark for Single and Multi-image Visual Question Answering on Vietnamese Infographics

Infographic Visual Question Answering (InfographicVQA) evaluates a model's ability to read and reason over data-rich, layout-heavy visuals that combine text, charts, icons, and design elements. Compared with scene-text or natural-image VQA,…

Computer Vision and Pattern Recognition · Computer Science 2025-12-16 Tue-Thu Van-Dinh , Hoang-Duy Tran , Truong-Binh Duong , Mai-Hanh Pham , Binh-Nam Le-Nguyen , Quoc-Thai Nguyen

Survey of Visual Question Answering: Datasets and Techniques

Visual question answering (or VQA) is a new and exciting problem that combines natural language processing and computer vision techniques. We present a survey of the various datasets and models that have been used to tackle this task. The…

Computation and Language · Computer Science 2017-05-12 Akshay Kumar Gupta

Document Collection Visual Question Answering

Current tasks and methods in Document Understanding aims to process documents as single elements. However, documents are usually organized in collections (historical records, purchase invoices), that provide context useful for their…

Information Retrieval · Computer Science 2023-04-04 Rubèn Tito , Dimosthenis Karatzas , Ernest Valveny

Understanding the Role of Scene Graphs in Visual Question Answering

Visual Question Answering (VQA) is of tremendous interest to the research community with important applications such as aiding visually impaired users and image-based search. In this work, we explore the use of scene graphs for solving the…

Computer Vision and Pattern Recognition · Computer Science 2021-01-19 Vinay Damodaran , Sharanya Chakravarthy , Akshay Kumar , Anjana Umapathy , Teruko Mitamura , Yuta Nakashima , Noa Garcia , Chenhui Chu

DVQA: Understanding Data Visualizations via Question Answering

Bar charts are an effective way to convey numeric information, but today's algorithms cannot parse them. Existing methods fail when faced with even minor variations in appearance. Here, we present DVQA, a dataset that tests many aspects of…

Computer Vision and Pattern Recognition · Computer Science 2018-03-30 Kushal Kafle , Brian Price , Scott Cohen , Christopher Kanan

DocVQA: A Dataset for VQA on Document Images

We present a new dataset for Visual Question Answering (VQA) on document images called DocVQA. The dataset consists of 50,000 questions defined on 12,000+ document images. Detailed analysis of the dataset in comparison with similar datasets…

Computer Vision and Pattern Recognition · Computer Science 2021-01-06 Minesh Mathew , Dimosthenis Karatzas , C. V. Jawahar

SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images

Visual question answering on document images that contain textual, visual, and layout information, called document VQA, has received much attention recently. Although many datasets have been proposed for developing document VQA systems,…

Computation and Language · Computer Science 2023-01-13 Ryota Tanaka , Kyosuke Nishida , Kosuke Nishida , Taku Hasegawa , Itsumi Saito , Kuniko Saito

IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning

Current visual question answering (VQA) tasks mainly consider answering human-annotated questions for natural images. However, aside from natural images, abstract diagrams with semantic richness are still understudied in visual…

Computer Vision and Pattern Recognition · Computer Science 2022-07-26 Pan Lu , Liang Qiu , Jiaqi Chen , Tony Xia , Yizhou Zhao , Wei Zhang , Zhou Yu , Xiaodan Liang , Song-Chun Zhu

Visual Question Answering using Deep Learning: A Survey and Performance Analysis

The Visual Question Answering (VQA) task combines challenges for processing data with both Visual and Linguistic processing, to answer basic `common sense' questions about given images. Given an image and a question in natural language, the…

Computer Vision and Pattern Recognition · Computer Science 2020-12-24 Yash Srivastava , Vaishnav Murali , Shiv Ram Dubey , Snehasis Mukherjee

Survey of Recent Advances in Visual Question Answering

Visual Question Answering (VQA) presents a unique challenge as it requires the ability to understand and encode the multi-modal inputs - in terms of image processing and natural language processing. The algorithm further needs to learn how…

Computer Vision and Pattern Recognition · Computer Science 2017-09-26 Supriya Pandhre , Shagun Sodhani

Visual Question Answering as Reading Comprehension

Visual question answering (VQA) demands simultaneous comprehension of both the image visual content and natural language questions. In some cases, the reasoning needs the help of common sense or general knowledge which usually appear in the…

Computer Vision and Pattern Recognition · Computer Science 2018-11-30 Hui Li , Peng Wang , Chunhua Shen , Anton van den Hengel

Visuo-Linguistic Question Answering (VLQA) Challenge

Understanding images and text together is an important aspect of cognition and building advanced Artificial Intelligence (AI) systems. As a community, we have achieved good benchmarks over language and vision domains separately, however…

Computer Vision and Pattern Recognition · Computer Science 2020-11-19 Shailaja Keyur Sampat , Yezhou Yang , Chitta Baral

Visual Query Answering by Entity-Attribute Graph Matching and Reasoning

Visual Query Answering (VQA) is of great significance in offering people convenience: one can raise a question for details of objects, or high-level understanding about the scene, over an image. This paper proposes a novel method to address…

Computer Vision and Pattern Recognition · Computer Science 2019-03-19 Peixi Xiong , Huayi Zhan , Xin Wang , Baivab Sinha , Ying Wu

Visual Question Answering: Datasets, Algorithms, and Future Challenges

Visual Question Answering (VQA) is a recent problem in computer vision and natural language processing that has garnered a large amount of interest from the deep learning, computer vision, and natural language processing communities. In…

Computer Vision and Pattern Recognition · Computer Science 2017-06-16 Kushal Kafle , Christopher Kanan

InfoChartQA: A Benchmark for Multimodal Question Answering on Infographic Charts

Understanding infographic charts with design-driven visual elements (e.g., pictograms, icons) requires both visual recognition and reasoning, posing challenges for multimodal large language models (MLLMs). However, existing visual-question…

Computer Vision and Pattern Recognition · Computer Science 2025-10-30 Tianchi Xie , Minzhi Lin , Mengchen Liu , Yilin Ye , Changjian Chen , Shixia Liu