Related papers: Weakly Supervised Visual Question Answer Generatio…

Generating Question Relevant Captions to Aid Visual Question Answering

Visual question answering (VQA) and image captioning require a shared body of general knowledge connecting language and vision. We present a novel approach to improve VQA performance that exploits this connection by jointly generating…

Computer Vision and Pattern Recognition · Computer Science 2020-01-07 Jialin Wu , Zeyuan Hu , Raymond J. Mooney

WeaQA: Weak Supervision via Captions for Visual Question Answering

Methodologies for training visual question answering (VQA) models assume the availability of datasets with human-annotated \textit{Image-Question-Answer} (I-Q-A) triplets. This has led to heavy reliance on datasets and a lack of…

Computer Vision and Pattern Recognition · Computer Science 2021-05-31 Pratyay Banerjee , Tejas Gokhale , Yezhou Yang , Chitta Baral

Weak Supervision Enhanced Generative Network for Question Generation

Automatic question generation according to an answer within the given passage is useful for many applications, such as question answering system, dialogue system, etc. Current neural-based methods mostly take two steps which extract several…

Computation and Language · Computer Science 2019-07-02 Yutong Wang , Jiyuan Zheng , Qijiong Liu , Zhou Zhao , Jun Xiao , Yueting Zhuang

Joint Image Captioning and Question Answering

Answering visual questions need acquire daily common knowledge and model the semantic connection among different parts in images, which is too difficult for VQA systems to learn from images with the only supervision from answers. Meanwhile,…

Computation and Language · Computer Science 2018-05-23 Jialin Wu , Zeyuan Hu , Raymond J. Mooney

Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering

Knowledge-based visual question answering (VQA) requires answering questions with external knowledge in addition to the content of images. One dataset that is mostly used in evaluating knowledge-based VQA is OK-VQA, but it lacks a gold…

Computation and Language · Computer Science 2021-09-10 Man Luo , Yankai Zeng , Pratyay Banerjee , Chitta Baral

Analysis of Visual Question Answering Algorithms with attention model

Visual question answering (VQA) usesimage processing algorithms to process the image and natural language processing methods to understand and answer the question. VQA is helpful to a visually impaired person, can be used for the security…

Computer Vision and Pattern Recognition · Computer Science 2023-05-31 Param Ahir , Hiteishi M. Diwanji

Visual Question Answering based on Local-Scene-Aware Referring Expression Generation

Visual question answering requires a deep understanding of both images and natural language. However, most methods mainly focus on visual concept; such as the relationships between various objects. The limited use of object categories…

Computer Vision and Pattern Recognition · Computer Science 2021-01-25 Jung-Jun Kim , Dong-Gyu Lee , Jialin Wu , Hong-Gyu Jung , Seong-Whan Lee

Improving Visual Question Answering by Referring to Generated Paragraph Captions

Paragraph-style image captions describe diverse aspects of an image as opposed to the more common single-sentence captions that only provide an abstract description of the image. These paragraph captions can hence contain substantial…

Computation and Language · Computer Science 2019-06-17 Hyounghun Kim , Mohit Bansal

Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions

Visual Question Answering is a multi-modal task that aims to measure high-level visual understanding. Contemporary VQA models are restrictive in the sense that answers are obtained via classification over a limited vocabulary (in the case…

Computer Vision and Pattern Recognition · Computer Science 2021-06-18 Radhika Dua , Sai Srinivas Kancheti , Vineeth N Balasubramanian

Speech-Based Visual Question Answering

This paper introduces speech-based visual question answering (VQA), the task of generating an answer given an image and a spoken question. Two methods are studied: an end-to-end, deep neural network that directly uses audio waveforms as…

Computation and Language · Computer Science 2017-09-19 Ted Zhang , Dengxin Dai , Tinne Tuytelaars , Marie-Francine Moens , Luc Van Gool

Automatic Generation of Grounded Visual Questions

In this paper, we propose the first model to be able to generate visually grounded questions with diverse types for a single image. Visual question generation is an emerging topic which aims to ask questions in natural language based on…

Computer Vision and Pattern Recognition · Computer Science 2017-05-30 Shijie Zhang , Lizhen Qu , Shaodi You , Zhenglu Yang , Jiawan Zhang

iVQA: Inverse Visual Question Answering

We propose the inverse problem of Visual question answering (iVQA), and explore its suitability as a benchmark for visuo-linguistic understanding. The iVQA task is to generate a question that corresponds to a given image and answer pair.…

Computer Vision and Pattern Recognition · Computer Science 2018-03-19 Feng Liu , Tao Xiang , Timothy M. Hospedales , Wankou Yang , Changyin Sun

Visual Question Answering: A Survey of Methods and Datasets

Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. Given an image and a question in natural language, it requires…

Computer Vision and Pattern Recognition · Computer Science 2016-07-21 Qi Wu , Damien Teney , Peng Wang , Chunhua Shen , Anthony Dick , Anton van den Hengel

Word to Sentence Visual Semantic Similarity for Caption Generation: Lessons Learned

This paper focuses on enhancing the captions generated by image-caption generation systems. We propose an approach for improving caption generation systems by choosing the most closely related output to the image rather than the most likely…

Computation and Language · Computer Science 2023-07-10 Ahmed Sabir

An Analysis of Visual Question Answering Algorithms

In visual question answering (VQA), an algorithm must answer text-based questions about images. While multiple datasets for VQA have been created since late 2014, they all have flaws in both their content and the way algorithms are…

Computer Vision and Pattern Recognition · Computer Science 2017-09-15 Kushal Kafle , Christopher Kanan

Proposing Plausible Answers for Open-ended Visual Question Answering

Answering open-ended questions is an essential capability for any intelligent agent. One of the most interesting recent open-ended question answering challenges is Visual Question Answering (VQA) which attempts to evaluate a system's visual…

Computation and Language · Computer Science 2016-10-25 Omid Bakhshandeh , Trung Bui , Zhe Lin , Walter Chang

Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool

In recent years, visual question answering (VQA) has become topical. The premise of VQA's significance as a benchmark in AI, is that both the image and textual question need to be well understood and mutually grounded in order to infer the…

Computer Vision and Pattern Recognition · Computer Science 2018-03-20 Feng Liu , Tao Xiang , Timothy M. Hospedales , Wankou Yang , Changyin Sun

A Comprehensive Survey on Visual Question Answering Datasets and Algorithms

Visual question answering (VQA) refers to the problem where, given an image and a natural language question about the image, a correct natural language answer has to be generated. A VQA model has to demonstrate both the visual understanding…

Computer Vision and Pattern Recognition · Computer Science 2024-11-19 Raihan Kabir , Naznin Haque , Md Saiful Islam , Marium-E-Jannat

Guiding Visual Question Generation

In traditional Visual Question Generation (VQG), most images have multiple concepts (e.g. objects and categories) for which a question could be generated, but models are trained to mimic an arbitrary choice of concept as given in their…

Machine Learning · Computer Science 2022-07-27 Nihir Vedd , Zixu Wang , Marek Rei , Yishu Miao , Lucia Specia

Generative Visual Question Answering

Multi-modal tasks involving vision and language in deep learning continue to rise in popularity and are leading to the development of newer models that can generalize beyond the extent of their training data. The current models lack…

Computer Vision and Pattern Recognition · Computer Science 2023-07-21 Ethan Shen , Scotty Singh , Bhavesh Kumar