Related papers: Information Maximizing Visual Question Generation

Automatic Generation of Grounded Visual Questions

In this paper, we propose the first model to be able to generate visually grounded questions with diverse types for a single image. Visual question generation is an emerging topic which aims to ask questions in natural language based on…

Computer Vision and Pattern Recognition · Computer Science 2017-05-30 Shijie Zhang , Lizhen Qu , Shaodi You , Zhenglu Yang , Jiawan Zhang

Generating Natural Questions About an Image

There has been an explosion of work in the vision & language community during the past few years from image captioning to video transcription, and answering questions about images. These tasks have focused on literal descriptions of the…

Computation and Language · Computer Science 2016-06-10 Nasrin Mostafazadeh , Ishan Misra , Jacob Devlin , Margaret Mitchell , Xiaodong He , Lucy Vanderwende

Guiding Visual Question Generation

In traditional Visual Question Generation (VQG), most images have multiple concepts (e.g. objects and categories) for which a question could be generated, but models are trained to mimic an arbitrary choice of concept as given in their…

Machine Learning · Computer Science 2022-07-27 Nihir Vedd , Zixu Wang , Marek Rei , Yishu Miao , Lucia Specia

Asking the Difficult Questions: Goal-Oriented Visual Question Generation via Intermediate Rewards

Despite significant progress in a variety of vision-and-language problems, developing a method capable of asking intelligent, goal-oriented questions about images is proven to be an inscrutable challenge. Towards this end, we propose a Deep…

Computer Vision and Pattern Recognition · Computer Science 2017-11-22 Junjie Zhang , Qi Wu , Chunhua Shen , Jian Zhang , Jianfeng Lu , Anton van den Hengel

Multi-VQG: Generating Engaging Questions for Multiple Images

Generating engaging content has drawn much recent attention in the NLP community. Asking questions is a natural way to respond to photos and promote awareness. However, most answers to questions in traditional question-answering (QA)…

Computation and Language · Computer Science 2022-11-21 Min-Hsuan Yeh , Vicent Chen , Ting-Hao 'Kenneth' Haung , Lun-Wei Ku

Customized Image Narrative Generation via Interactive Visual Question Generation and Answering

Image description task has been invariably examined in a static manner with qualitative presumptions held to be universally applicable, regardless of the scope or target of the description. In practice, however, different viewers may pay…

Computation and Language · Computer Science 2018-05-02 Andrew Shin , Yoshitaka Ushiku , Tatsuya Harada

Visual Question Answering based on Local-Scene-Aware Referring Expression Generation

Visual question answering requires a deep understanding of both images and natural language. However, most methods mainly focus on visual concept; such as the relationships between various objects. The limited use of object categories…

Computer Vision and Pattern Recognition · Computer Science 2021-01-25 Jung-Jun Kim , Dong-Gyu Lee , Jialin Wu , Hong-Gyu Jung , Seong-Whan Lee

OptGAN: Optimizing and Interpreting the Latent Space of the Conditional Text-to-Image GANs

Text-to-image generation intends to automatically produce a photo-realistic image, conditioned on a textual description. It can be potentially employed in the field of art creation, data augmentation, photo-editing, etc. Although many…

Computer Vision and Pattern Recognition · Computer Science 2022-03-01 Zhenxing Zhang , Lambert Schomaker

Learning to Disambiguate by Asking Discriminative Questions

The ability to ask questions is a powerful tool to gather information in order to learn about the world and resolve ambiguities. In this paper, we explore a novel problem of generating discriminative questions to help disambiguate visual…

Computer Vision and Pattern Recognition · Computer Science 2017-08-10 Yining Li , Chen Huang , Xiaoou Tang , Chen-Change Loy

Interactive Fashion Content Generation Using LLMs and Latent Diffusion Models

Fashionable image generation aims to synthesize images of diverse fashion prevalent around the globe, helping fashion designers in real-time visualization by giving them a basic customized structure of how a specific design preference would…

Computer Vision and Pattern Recognition · Computer Science 2023-06-14 Krishna Sri Ipsit Mantri , Nevasini Sasikumar

Let's Talk! Striking Up Conversations via Conversational Visual Question Generation

An engaging and provocative question can open up a great conversation. In this work, we explore a novel scenario: a conversation agent views a set of the user's photos (for example, from social media platforms) and asks an engaging question…

Artificial Intelligence · Computer Science 2022-05-20 Shih-Han Chan , Tsai-Lun Yang , Yun-Wei Chu , Chi-Yang Hsu , Ting-Hao Huang , Yu-Shian Chiu , Lun-Wei Ku

Bridging the Intent Gap: Knowledge-Enhanced Visual Generation

For visual content generation, discrepancies between user intentions and the generated content have been a longstanding problem. This discrepancy arises from two main factors. First, user intentions are inherently complex, with subtle…

Computer Vision and Pattern Recognition · Computer Science 2024-05-22 Yi Cheng , Ziwei Xu , Dongyun Lin , Harry Cheng , Yongkang Wong , Ying Sun , Joo Hwee Lim , Mohan Kankanhalli

Creativity: Generating Diverse Questions using Variational Autoencoders

Generating diverse questions for given images is an important task for computational education, entertainment and AI assistants. Different from many conventional prediction techniques is the need for algorithms to generate a diverse set of…

Computer Vision and Pattern Recognition · Computer Science 2017-04-13 Unnat Jain , Ziyu Zhang , Alexander Schwing

Analyzing Quality, Bias, and Performance in Text-to-Image Generative Models

Advances in generative models have led to significant interest in image synthesis, demonstrating the ability to generate high-quality images for a diverse range of text prompts. Despite this progress, most studies ignore the presence of…

Artificial Intelligence · Computer Science 2024-07-02 Nila Masrourisaadat , Nazanin Sedaghatkish , Fatemeh Sarshartehrani , Edward A. Fox

Ask Me Anything: Free-form Visual Question Answering Based on Knowledge from External Sources

We propose a method for visual question answering which combines an internal representation of the content of an image with information extracted from a general knowledge base to answer a broad range of image-based questions. This allows…

Computer Vision and Pattern Recognition · Computer Science 2016-04-15 Qi Wu , Peng Wang , Chunhua Shen , Anthony Dick , Anton van den Hengel

Multimodal Differential Network for Visual Question Generation

Generating natural questions from an image is a semantic task that requires using visual and language modality to learn multimodal representations. Images can have multiple visual and language contexts that are relevant for generating…

Computation and Language · Computer Science 2019-10-18 Badri N. Patro , Sandeep Kumar , Vinod K. Kurmi , Vinay P. Namboodiri

Generating Natural Questions from Images for Multimodal Assistants

Generating natural, diverse, and meaningful questions from images is an essential task for multimodal assistants as it confirms whether they have understood the object and scene in the images properly. The research in visual question…

Computer Vision and Pattern Recognition · Computer Science 2020-12-08 Alkesh Patel , Akanksha Bindal , Hadas Kotek , Christopher Klein , Jason Williams

Multi-View Data Generation Without View Supervision

The development of high-dimensional generative models has recently gained a great surge of interest with the introduction of variational auto-encoders and generative adversarial neural networks. Different variants have been proposed where…

Computer Vision and Pattern Recognition · Computer Science 2019-04-18 Mickaël Chen , Ludovic Denoyer , Thierry Artières

Natural Language Generation from Visual Events: State-of-the-Art and Key Open Questions

In recent years, a substantial body of work in visually grounded natural language processing has focused on real-life multimodal scenarios such as describing content depicted in images or videos. However, comparatively less attention has…

Computation and Language · Computer Science 2025-08-21 Aditya K Surikuchi , Raquel Fernández , Sandro Pezzelle

The Thinking Pixel: Recursive Sparse Reasoning in Multimodal Diffusion Latents

Diffusion models have achieved success in high-fidelity data synthesis, yet their capacity for more complex, structured reasoning like text following tasks remains constrained. While advances in language models have leveraged strategies…

Computer Vision and Pattern Recognition · Computer Science 2026-04-29 Yuwei Sun , Yuxuan Yao , Hui Li , Siyu Zhu