Related papers: Deep Bayesian Network for Visual Question Generati…

Multimodal Differential Network for Visual Question Generation

Generating natural questions from an image is a semantic task that requires using visual and language modality to learn multimodal representations. Images can have multiple visual and language contexts that are relevant for generating…

Computation and Language · Computer Science 2019-10-18 Badri N. Patro , Sandeep Kumar , Vinod K. Kurmi , Vinay P. Namboodiri

Generating Natural Questions About an Image

There has been an explosion of work in the vision & language community during the past few years from image captioning to video transcription, and answering questions about images. These tasks have focused on literal descriptions of the…

Computation and Language · Computer Science 2016-06-10 Nasrin Mostafazadeh , Ishan Misra , Jacob Devlin , Margaret Mitchell , Xiaodong He , Lucy Vanderwende

Deep Bayesian Active Learning for Multiple Correct Outputs

Typical active learning strategies are designed for tasks, such as classification, with the assumption that the output space is mutually exclusive. The assumption that these tasks always have exactly one correct answer has resulted in the…

Computer Vision and Pattern Recognition · Computer Science 2019-12-10 Khaled Jedoui , Ranjay Krishna , Michael Bernstein , Li Fei-Fei

Generating Natural Questions from Images for Multimodal Assistants

Generating natural, diverse, and meaningful questions from images is an essential task for multimodal assistants as it confirms whether they have understood the object and scene in the images properly. The research in visual question…

Computer Vision and Pattern Recognition · Computer Science 2020-12-08 Alkesh Patel , Akanksha Bindal , Hadas Kotek , Christopher Klein , Jason Williams

Solving Visual Madlibs with Multiple Cues

This paper focuses on answering fill-in-the-blank style multiple choice questions from the Visual Madlibs dataset. Previous approaches to Visual Question Answering (VQA) have mainly used generic image features from networks trained on the…

Computer Vision and Pattern Recognition · Computer Science 2016-08-12 Tatiana Tommasi , Arun Mallya , Bryan Plummer , Svetlana Lazebnik , Alexander C. Berg , Tamara L. Berg

Guiding Visual Question Generation

In traditional Visual Question Generation (VQG), most images have multiple concepts (e.g. objects and categories) for which a question could be generated, but models are trained to mimic an arbitrary choice of concept as given in their…

Machine Learning · Computer Science 2022-07-27 Nihir Vedd , Zixu Wang , Marek Rei , Yishu Miao , Lucia Specia

A Survey on Bayesian Deep Learning

A comprehensive artificial intelligence system needs to not only perceive the environment with different `senses' (e.g., seeing and hearing) but also infer the world's conditional (or even causal) relations and corresponding uncertainty.…

Machine Learning · Statistics 2021-01-07 Hao Wang , Dit-Yan Yeung

C3VQG: Category Consistent Cyclic Visual Question Generation

Visual Question Generation (VQG) is the task of generating natural questions based on an image. Popular methods in the past have explored image-to-sequence architectures trained with maximum likelihood which have demonstrated meaningful…

Computer Vision and Pattern Recognition · Computer Science 2021-01-12 Shagun Uppal , Anish Madan , Sarthak Bhagat , Yi Yu , Rajiv Ratn Shah

Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual Question Answering

Visual Question Answering (VQA) has emerged as one of the most challenging tasks in artificial intelligence due to its multi-modal nature. However, most existing VQA methods are incapable of handling Knowledge-based Visual Question…

Computer Vision and Pattern Recognition · Computer Science 2023-12-21 Chengxiang Yin , Zhengping Che , Kun Wu , Zhiyuan Xu , Jian Tang

Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference

The visual question generation (VQG) task aims to generate human-like questions from an image and potentially other side information (e.g. answer type). Previous works on VQG fall in two aspects: i) They suffer from one image to many…

Computer Vision and Pattern Recognition · Computer Science 2024-07-09 Kai Shen , Lingfei Wu , Siliang Tang , Fangli Xu , Bo Long , Yueting Zhuang , Jian Pei

Combining Multiple Cues for Visual Madlibs Question Answering

This paper presents an approach for answering fill-in-the-blank multiple choice questions from the Visual Madlibs dataset. Instead of generic and commonly used representations trained on the ImageNet classification task, our approach…

Computer Vision and Pattern Recognition · Computer Science 2018-02-09 Tatiana Tommasi , Arun Mallya , Bryan Plummer , Svetlana Lazebnik , Alexander C. Berg , Tamara L. Berg

Modeling rapid language learning by distilling Bayesian priors into artificial neural networks

Humans can learn languages from remarkably little experience. Developing computational models that explain this ability has been a major challenge in cognitive science. Bayesian models that build in strong inductive biases - factors that…

Computation and Language · Computer Science 2023-05-25 R. Thomas McCoy , Thomas L. Griffiths

Multi-VQG: Generating Engaging Questions for Multiple Images

Generating engaging content has drawn much recent attention in the NLP community. Asking questions is a natural way to respond to photos and promote awareness. However, most answers to questions in traditional question-answering (QA)…

Computation and Language · Computer Science 2022-11-21 Min-Hsuan Yeh , Vicent Chen , Ting-Hao 'Kenneth' Haung , Lun-Wei Ku

A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input

We propose a method for automatically answering questions about images by bringing together recent advances from natural language processing and computer vision. We combine discrete reasoning with uncertain predictions by a multi-world…

Artificial Intelligence · Computer Science 2015-05-06 Mateusz Malinowski , Mario Fritz

ConVQG: Contrastive Visual Question Generation with Multimodal Guidance

Asking questions about visual environments is a crucial way for intelligent agents to understand rich multi-faceted scenes, raising the importance of Visual Question Generation (VQG) systems. Apart from being grounded to the image, existing…

Computer Vision and Pattern Recognition · Computer Science 2024-02-21 Li Mi , Syrielle Montariol , Javiera Castillo-Navarro , Xianjie Dai , Antoine Bosselut , Devis Tuia

Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge

With the breakthrough of multi-modal large language models, answering complex visual questions that demand advanced reasoning abilities and world knowledge has become a much more important testbed for developing AI models than ever.…

Computer Vision and Pattern Recognition · Computer Science 2024-10-15 Haibo Wang , Weifeng Ge

Visual Question Answering based on Local-Scene-Aware Referring Expression Generation

Visual question answering requires a deep understanding of both images and natural language. However, most methods mainly focus on visual concept; such as the relationships between various objects. The limited use of object categories…

Computer Vision and Pattern Recognition · Computer Science 2021-01-25 Jung-Jun Kim , Dong-Gyu Lee , Jialin Wu , Hong-Gyu Jung , Seong-Whan Lee

Towards Bayesian Deep Learning: A Framework and Some Existing Methods

While perception tasks such as visual object recognition and text understanding play an important role in human intelligence, the subsequent tasks that involve inference, reasoning and planning require an even higher level of intelligence.…

Machine Learning · Statistics 2016-09-06 Hao Wang , Dit-Yan Yeung

Visual Question Answering using Deep Learning: A Survey and Performance Analysis

The Visual Question Answering (VQA) task combines challenges for processing data with both Visual and Linguistic processing, to answer basic `common sense' questions about given images. Given an image and a question in natural language, the…

Computer Vision and Pattern Recognition · Computer Science 2020-12-24 Yash Srivastava , Vaishnav Murali , Shiv Ram Dubey , Snehasis Mukherjee

A Question-Centric Model for Visual Question Answering in Medical Imaging

Deep learning methods have proven extremely effective at performing a variety of medical image analysis tasks. With their potential use in clinical routine, their lack of transparency has however been one of their few weak points, raising…

Computer Vision and Pattern Recognition · Computer Science 2020-03-23 Minh H. Vu , Tommy Löfstedt , Tufve Nyholm , Raphael Sznitman