Related papers: Deep Bayesian Active Learning for Multiple Correct…

Active Learning for Visual Question Answering: An Empirical Study

We present an empirical study of active learning for Visual Question Answering, where a deep VQA model selects informative question-image pairs from a pool and queries an oracle for answers to maximally improve its performance under a…

Computer Vision and Pattern Recognition · Computer Science 2017-11-07 Xiao Lin , Devi Parikh

Visual Question Answering using Deep Learning: A Survey and Performance Analysis

The Visual Question Answering (VQA) task combines challenges for processing data with both Visual and Linguistic processing, to answer basic `common sense' questions about given images. Given an image and a question in natural language, the…

Computer Vision and Pattern Recognition · Computer Science 2020-12-24 Yash Srivastava , Vaishnav Murali , Shiv Ram Dubey , Snehasis Mukherjee

Visual Question Answering as a Meta Learning Task

The predominant approach to Visual Question Answering (VQA) demands that the model represents within its weights all of the information required to answer any question about any image. Learning this information from any real training set…

Computer Vision and Pattern Recognition · Computer Science 2017-11-23 Damien Teney , Anton van den Hengel

Learning Answer Embeddings for Visual Question Answering

We propose a novel probabilistic model for visual question answering (Visual QA). The key idea is to infer two sets of embeddings: one for the image and the question jointly and the other for the answers. The learning objective is to learn…

Computer Vision and Pattern Recognition · Computer Science 2018-06-12 Hexiang Hu , Wei-Lun Chao , Fei Sha

Learn 3D VQA Better with Active Selection and Reannotation

3D Visual Question Answering (3D VQA) is crucial for enabling models to perceive the physical world and perform spatial reasoning. In 3D VQA, the free-form nature of answers often leads to improper annotations that can confuse or mislead…

Computer Vision and Pattern Recognition · Computer Science 2025-08-19 Shengli Zhou , Yang Liu , Feng Zheng

Towards a Unified Model for Generating Answers and Explanations in Visual Question Answering

The field of visual question answering (VQA) has recently seen a surge in research focused on providing explanations for predicted answers. However, current systems mostly rely on separate models to predict answers and generate…

Computation and Language · Computer Science 2023-02-14 Chenxi Whitehouse , Tillman Weyde , Pranava Madhyastha

A Simple Loss Function for Improving the Convergence and Accuracy of Visual Question Answering Models

Visual question answering as recently proposed multimodal learning task has enjoyed wide attention from the deep learning community. Lately, the focus was on developing new representation fusion methods and attention mechanisms to achieve…

Computer Vision and Pattern Recognition · Computer Science 2017-08-03 Ilija Ilievski , Jiashi Feng

An Analysis of Visual Question Answering Algorithms

In visual question answering (VQA), an algorithm must answer text-based questions about images. While multiple datasets for VQA have been created since late 2014, they all have flaws in both their content and the way algorithms are…

Computer Vision and Pattern Recognition · Computer Science 2017-09-15 Kushal Kafle , Christopher Kanan

A Comprehensive Survey on Visual Question Answering Datasets and Algorithms

Visual question answering (VQA) refers to the problem where, given an image and a natural language question about the image, a correct natural language answer has to be generated. A VQA model has to demonstrate both the visual understanding…

Computer Vision and Pattern Recognition · Computer Science 2024-11-19 Raihan Kabir , Naznin Haque , Md Saiful Islam , Marium-E-Jannat

Variational Visual Question Answering for Uncertainty-Aware Selective Prediction

Despite remarkable progress in recent years, Vision Language Models (VLMs) remain prone to overconfidence and hallucinations on tasks such as Visual Question Answering (VQA) and Visual Reasoning. Bayesian methods can potentially improve…

Computer Vision and Pattern Recognition · Computer Science 2026-04-23 Tobias Jan Wieczorek , Nathalie Daun , Mohammad Emtiyaz Khan , Marcus Rohrbach

Survey of Visual Question Answering: Datasets and Techniques

Visual question answering (or VQA) is a new and exciting problem that combines natural language processing and computer vision techniques. We present a survey of the various datasets and models that have been used to tackle this task. The…

Computation and Language · Computer Science 2017-05-12 Akshay Kumar Gupta

Revisiting Visual Question Answering Baselines

Visual question answering (VQA) is an interesting learning setting for evaluating the abilities and shortcomings of current systems for image understanding. Many of the recently proposed VQA systems include attention or memory mechanisms…

Computer Vision and Pattern Recognition · Computer Science 2016-11-24 Allan Jabri , Armand Joulin , Laurens van der Maaten

Solving Visual Madlibs with Multiple Cues

This paper focuses on answering fill-in-the-blank style multiple choice questions from the Visual Madlibs dataset. Previous approaches to Visual Question Answering (VQA) have mainly used generic image features from networks trained on the…

Computer Vision and Pattern Recognition · Computer Science 2016-08-12 Tatiana Tommasi , Arun Mallya , Bryan Plummer , Svetlana Lazebnik , Alexander C. Berg , Tamara L. Berg

The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions

One of the most intriguing features of the Visual Question Answering (VQA) challenge is the unpredictability of the questions. Extracting the information required to answer them demands a variety of image operations from detection and…

Computer Vision and Pattern Recognition · Computer Science 2016-12-19 Peng Wang , Qi Wu , Chunhua Shen , Anton van den Hengel

A survey on VQA_Datasets and Approaches

Visual question answering (VQA) is a task that combines both the techniques of computer vision and natural language processing. It requires models to answer a text-based question according to the information contained in a visual. In recent…

Computer Vision and Pattern Recognition · Computer Science 2021-05-04 Yeyun Zou , Qiyu Xie

Visual Question Answering based on Local-Scene-Aware Referring Expression Generation

Visual question answering requires a deep understanding of both images and natural language. However, most methods mainly focus on visual concept; such as the relationships between various objects. The limited use of object categories…

Computer Vision and Pattern Recognition · Computer Science 2021-01-25 Jung-Jun Kim , Dong-Gyu Lee , Jialin Wu , Hong-Gyu Jung , Seong-Whan Lee

Analyzing the Behavior of Visual Question Answering Models

Recently, a number of deep-learning based models have been proposed for the task of Visual Question Answering (VQA). The performance of most models is clustered around 60-70%. In this paper we propose systematic methods to analyze the…

Computation and Language · Computer Science 2016-10-05 Aishwarya Agrawal , Dhruv Batra , Devi Parikh

Visual Question Answering: A Survey of Methods and Datasets

Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. Given an image and a question in natural language, it requires…

Computer Vision and Pattern Recognition · Computer Science 2016-07-21 Qi Wu , Damien Teney , Peng Wang , Chunhua Shen , Anthony Dick , Anton van den Hengel

Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly

Machine learning has advanced dramatically, narrowing the accuracy gap to humans in multimodal tasks like visual question answering (VQA). However, while humans can say "I don't know" when they are uncertain (i.e., abstain from answering a…

Computer Vision and Pattern Recognition · Computer Science 2022-10-21 Spencer Whitehead , Suzanne Petryk , Vedaad Shakib , Joseph Gonzalez , Trevor Darrell , Anna Rohrbach , Marcus Rohrbach

Incorporating External Knowledge to Answer Open-Domain Visual Questions with Dynamic Memory Networks

Visual Question Answering (VQA) has attracted much attention since it offers insight into the relationships between the multi-modal analysis of images and natural language. Most of the current algorithms are incapable of answering…

Computer Vision and Pattern Recognition · Computer Science 2017-12-05 Guohao Li , Hang Su , Wenwu Zhu