Related papers: Solving Visual Madlibs with Multiple Cues

Combining Multiple Cues for Visual Madlibs Question Answering

This paper presents an approach for answering fill-in-the-blank multiple choice questions from the Visual Madlibs dataset. Instead of generic and commonly used representations trained on the ImageNet classification task, our approach…

Computer Vision and Pattern Recognition · Computer Science 2018-02-09 Tatiana Tommasi , Arun Mallya , Bryan Plummer , Svetlana Lazebnik , Alexander C. Berg , Tamara L. Berg

Visual Question Answering using Deep Learning: A Survey and Performance Analysis

The Visual Question Answering (VQA) task combines challenges for processing data with both Visual and Linguistic processing, to answer basic `common sense' questions about given images. Given an image and a question in natural language, the…

Computer Vision and Pattern Recognition · Computer Science 2020-12-24 Yash Srivastava , Vaishnav Murali , Shiv Ram Dubey , Snehasis Mukherjee

An Analysis of Visual Question Answering Algorithms

In visual question answering (VQA), an algorithm must answer text-based questions about images. While multiple datasets for VQA have been created since late 2014, they all have flaws in both their content and the way algorithms are…

Computer Vision and Pattern Recognition · Computer Science 2017-09-15 Kushal Kafle , Christopher Kanan

Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering

This paper proposes deep convolutional network models that utilize local and global context to make human activity label predictions in still images, achieving state-of-the-art performance on two recent datasets with hundreds of labels…

Computer Vision and Pattern Recognition · Computer Science 2016-07-29 Arun Mallya , Svetlana Lazebnik

Survey of Recent Advances in Visual Question Answering

Visual Question Answering (VQA) presents a unique challenge as it requires the ability to understand and encode the multi-modal inputs - in terms of image processing and natural language processing. The algorithm further needs to learn how…

Computer Vision and Pattern Recognition · Computer Science 2017-09-26 Supriya Pandhre , Shagun Sodhani

Visual Question Answering: A Survey of Methods and Datasets

Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. Given an image and a question in natural language, it requires…

Computer Vision and Pattern Recognition · Computer Science 2016-07-21 Qi Wu , Damien Teney , Peng Wang , Chunhua Shen , Anthony Dick , Anton van den Hengel

Survey of Visual Question Answering: Datasets and Techniques

Visual question answering (or VQA) is a new and exciting problem that combines natural language processing and computer vision techniques. We present a survey of the various datasets and models that have been used to tackle this task. The…

Computation and Language · Computer Science 2017-05-12 Akshay Kumar Gupta

Visual Question Answering Using Semantic Information from Image Descriptions

In this work, we propose a deep neural architecture that uses an attention mechanism which utilizes region based image features, the natural language question asked, and semantic knowledge extracted from the regions of an image to produce…

Computation and Language · Computer Science 2021-04-06 Tasmia Tasrin , Md Sultan Al Nahian , Brent Harrison

Question Type Guided Attention in Visual Question Answering

Visual Question Answering (VQA) requires integration of feature maps with drastically different structures and focus of the correct regions. Image descriptors have structures at multiple spatial scales, while lexical inputs inherently…

Computer Vision and Pattern Recognition · Computer Science 2018-07-20 Yang Shi , Tommaso Furlanello , Sheng Zha , Animashree Anandkumar

A survey on VQA_Datasets and Approaches

Visual question answering (VQA) is a task that combines both the techniques of computer vision and natural language processing. It requires models to answer a text-based question according to the information contained in a visual. In recent…

Computer Vision and Pattern Recognition · Computer Science 2021-05-04 Yeyun Zou , Qiyu Xie

A Comprehensive Survey on Visual Question Answering Datasets and Algorithms

Visual question answering (VQA) refers to the problem where, given an image and a natural language question about the image, a correct natural language answer has to be generated. A VQA model has to demonstrate both the visual understanding…

Computer Vision and Pattern Recognition · Computer Science 2024-11-19 Raihan Kabir , Naznin Haque , Md Saiful Islam , Marium-E-Jannat

Learning Compositional Representation for Few-shot Visual Question Answering

Current methods of Visual Question Answering perform well on the answers with an amount of training data but have limited accuracy on the novel ones with few examples. However, humans can quickly adapt to these new categories with just a…

Computer Vision and Pattern Recognition · Computer Science 2021-02-23 Dalu Guo , Dacheng Tao

Revisiting Visual Question Answering Baselines

Visual question answering (VQA) is an interesting learning setting for evaluating the abilities and shortcomings of current systems for image understanding. Many of the recently proposed VQA systems include attention or memory mechanisms…

Computer Vision and Pattern Recognition · Computer Science 2016-11-24 Allan Jabri , Armand Joulin , Laurens van der Maaten

Visual Question Answering based on Local-Scene-Aware Referring Expression Generation

Visual question answering requires a deep understanding of both images and natural language. However, most methods mainly focus on visual concept; such as the relationships between various objects. The limited use of object categories…

Computer Vision and Pattern Recognition · Computer Science 2021-01-25 Jung-Jun Kim , Dong-Gyu Lee , Jialin Wu , Hong-Gyu Jung , Seong-Whan Lee

Coarse-to-Fine Reasoning for Visual Question Answering

Bridging the semantic gap between image and question is an important step to improve the accuracy of the Visual Question Answering (VQA) task. However, most of the existing VQA methods focus on attention mechanisms or visual relations for…

Computer Vision and Pattern Recognition · Computer Science 2022-04-20 Binh X. Nguyen , Tuong Do , Huy Tran , Erman Tjiputra , Quang D. Tran , Anh Nguyen

Localized Questions in Medical Visual Question Answering

Visual Question Answering (VQA) models aim to answer natural language questions about given images. Due to its ability to ask questions that differ from those used when training the model, medical VQA has received substantial attention in…

Computer Vision and Pattern Recognition · Computer Science 2023-07-04 Sergio Tascon-Morales , Pablo Márquez-Neila , Raphael Sznitman

Visual Question Answering as a Meta Learning Task

The predominant approach to Visual Question Answering (VQA) demands that the model represents within its weights all of the information required to answer any question about any image. Learning this information from any real training set…

Computer Vision and Pattern Recognition · Computer Science 2017-11-23 Damien Teney , Anton van den Hengel

Analysis of Visual Question Answering Algorithms with attention model

Visual question answering (VQA) usesimage processing algorithms to process the image and natural language processing methods to understand and answer the question. VQA is helpful to a visually impaired person, can be used for the security…

Computer Vision and Pattern Recognition · Computer Science 2023-05-31 Param Ahir , Hiteishi M. Diwanji

Augmenting Visual Question Answering with Semantic Frame Information in a Multitask Learning Approach

Visual Question Answering (VQA) concerns providing answers to Natural Language questions about images. Several deep neural network approaches have been proposed to model the task in an end-to-end fashion. Whereas the task is grounded in…

Artificial Intelligence · Computer Science 2020-02-03 Mehrdad Alizadeh , Barbara Di Eugenio

DualNet: Domain-Invariant Network for Visual Question Answering

Visual question answering (VQA) task not only bridges the gap between images and language, but also requires that specific contents within the image are understood as indicated by linguistic context of the question, in order to generate the…

Computer Vision and Pattern Recognition · Computer Science 2017-05-05 Kuniaki Saito , Andrew Shin , Yoshitaka Ushiku , Tatsuya Harada