Related papers: Probabilistic framework for solving Visual Dialog

DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue

Different from Visual Question Answering task that requires to answer only one question about an image, Visual Dialogue involves multiple questions which cover a broad range of visual content that could be related to any objects,…

Computer Vision and Pattern Recognition · Computer Science 2019-11-19 Xiaoze Jiang , Jing Yu , Zengchang Qin , Yingying Zhuang , Xingxing Zhang , Yue Hu , Qi Wu

Reasoning Visual Dialogs with Structural and Partial Observations

We propose a novel model to address the task of Visual Dialog which exhibits complex dialog structures. To obtain a reasonable answer based on the current question and the dialog history, the underlying semantic dependencies between dialog…

Computer Vision and Pattern Recognition · Computer Science 2019-05-30 Zilong Zheng , Wenguan Wang , Siyuan Qi , Song-Chun Zhu

Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models

Recent advancements in dialogue systems have highlighted the significance of integrating multimodal responses, which enable conveying ideas through diverse modalities rather than solely relying on text-based interactions. This enrichment…

Computation and Language · Computer Science 2024-07-08 Chang-Sheng Kao , Yun-Nung Chen

What's to know? Uncertainty as a Guide to Asking Goal-oriented Questions

One of the core challenges in Visual Dialogue problems is asking the question that will provide the most useful information towards achieving the required objective. Encouraging an agent to ask the right questions is difficult because we…

Artificial Intelligence · Computer Science 2018-12-18 Ehsan Abbasnejad , Qi Wu , Javen Shi , Anton van den Hengel

Multi-View Attention Network for Visual Dialog

Visual dialog is a challenging vision-language task in which a series of questions visually grounded by a given image are answered. To resolve the visual dialog task, a high-level understanding of various multimodal inputs (e.g., question,…

Artificial Intelligence · Computer Science 2020-10-08 Sungjin Park , Taesun Whang , Yeochan Yoon , Heuiseok Lim

DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog

Visual Dialog is a vision-language task that requires an AI agent to engage in a conversation with humans grounded in an image. It remains a challenging task since it requires the agent to fully understand a given question before making an…

Computation and Language · Computer Science 2019-12-19 Feilong Chen , Fandong Meng , Jiaming Xu , Peng Li , Bo Xu , Jie Zhou

Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning

Visual Dialog is a challenging vision-language task since the visual dialog agent needs to answer a series of questions after reasoning over both the image content and dialog history. Though existing methods try to deal with the cross-modal…

Computer Vision and Pattern Recognition · Computer Science 2022-04-18 Feilong Chen , Xiuyi Chen , Shuang Xu , Bo Xu

Mixing Modes: Active and Passive Integration of Speech, Text, and Visualization for Communicating Data Uncertainty

Interpreting uncertain data can be difficult, particularly if the data presentation is complex. We investigate the efficacy of different modalities for representing data and how to combine the strengths of each modality to facilitate the…

Human-Computer Interaction · Computer Science 2024-04-15 Chase Stokes , Chelsea Sanker , Bridget Cogley , Vidya Setlur

Modality-Balanced Models for Visual Dialogue

The Visual Dialog task requires a model to exploit both image and conversational context information to generate the next response to the dialogue. However, via manual analysis, we find that a large number of conversational questions can be…

Computation and Language · Computer Science 2020-01-20 Hyounghun Kim , Hao Tan , Mohit Bansal

Survey of Recent Advances in Visual Question Answering

Visual Question Answering (VQA) presents a unique challenge as it requires the ability to understand and encode the multi-modal inputs - in terms of image processing and natural language processing. The algorithm further needs to learn how…

Computer Vision and Pattern Recognition · Computer Science 2017-09-26 Supriya Pandhre , Shagun Sodhani

VU-BERT: A Unified framework for Visual Dialog

The visual dialog task attempts to train an agent to answer multi-turn questions given an image, which requires the deep understanding of interactions between the image and dialog history. Existing researches tend to employ the…

Computation and Language · Computer Science 2022-02-23 Tong Ye , Shijing Si , Jianzong Wang , Rui Wang , Ning Cheng , Jing Xiao

Modeling Coreference Relations in Visual Dialog

Visual dialog is a vision-language task where an agent needs to answer a series of questions grounded in an image based on the understanding of the dialog history and the image. The occurrences of coreference relations in the dialog makes…

Computer Vision and Pattern Recognition · Computer Science 2022-03-08 Mingxiao Li , Marie-Francine Moens

Visuo-Linguistic Question Answering (VLQA) Challenge

Understanding images and text together is an important aspect of cognition and building advanced Artificial Intelligence (AI) systems. As a community, we have achieved good benchmarks over language and vision domains separately, however…

Computer Vision and Pattern Recognition · Computer Science 2020-11-19 Shailaja Keyur Sampat , Yezhou Yang , Chitta Baral

Generative Visual Dialogue System via Adaptive Reasoning and Weighted Likelihood Estimation

The key challenge of generative Visual Dialogue (VD) systems is to respond to human queries with informative answers in natural and contiguous conversation flow. Traditional Maximum Likelihood Estimation (MLE)-based methods only learn from…

Computer Vision and Pattern Recognition · Computer Science 2019-08-15 Heming Zhang , Shalini Ghosh , Larry Heck , Stephen Walsh , Junting Zhang , Jie Zhang , C. -C. Jay Kuo

Learning Answer Embeddings for Visual Question Answering

We propose a novel probabilistic model for visual question answering (Visual QA). The key idea is to infer two sets of embeddings: one for the image and the question jointly and the other for the answers. The learning objective is to learn…

Computer Vision and Pattern Recognition · Computer Science 2018-06-12 Hexiang Hu , Wei-Lun Chao , Fei Sha

Building Goal-Oriented Dialogue Systems with Situated Visual Context

Most popular goal-oriented dialogue agents are capable of understanding the conversational context. However, with the surge of virtual assistants with screen, the next generation of agents are required to also understand screen context in…

Machine Learning · Computer Science 2021-11-26 Sanchit Agarwal , Jan Jezabek , Arijit Biswas , Emre Barut , Shuyang Gao , Tagyoung Chung

Visual Question Answering as Reading Comprehension

Visual question answering (VQA) demands simultaneous comprehension of both the image visual content and natural language questions. In some cases, the reasoning needs the help of common sense or general knowledge which usually appear in the…

Computer Vision and Pattern Recognition · Computer Science 2018-11-30 Hui Li , Peng Wang , Chunhua Shen , Anton van den Hengel

Deep Neural Networks for Visual Reasoning

Visual perception and language understanding are - fundamental components of human intelligence, enabling them to understand and reason about objects and their interactions. It is crucial for machines to have this capacity to reason using…

Computer Vision and Pattern Recognition · Computer Science 2022-09-27 Thao Minh Le

MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model

Multimodal semantic understanding often has to deal with uncertainty, which means the obtained messages tend to refer to multiple targets. Such uncertainty is problematic for our interpretation, including inter- and intra-modal uncertainty.…

Computer Vision and Pattern Recognition · Computer Science 2023-07-21 Yatai Ji , Junjie Wang , Yuan Gong , Lin Zhang , Yanru Zhu , Hongfa Wang , Jiaxing Zhang , Tetsuya Sakai , Yujiu Yang

Proposing Plausible Answers for Open-ended Visual Question Answering

Answering open-ended questions is an essential capability for any intelligent agent. One of the most interesting recent open-ended question answering challenges is Visual Question Answering (VQA) which attempts to evaluate a system's visual…

Computation and Language · Computer Science 2016-10-25 Omid Bakhshandeh , Trung Bui , Zhe Lin , Walter Chang