English
Related papers

Related papers: Visual Dialog

200 papers

This work aims to create a multimodal AI system that chats with humans and shares relevant photos. While earlier works were limited to dialogues about specific objects or scenes within images, recent works have incorporated images into…

Computation and Language · Computer Science 2023-05-08 Min Young Lee

Different from Visual Question Answering task that requires to answer only one question about an image, Visual Dialogue involves multiple questions which cover a broad range of visual content that could be related to any objects,…

Computer Vision and Pattern Recognition · Computer Science 2019-11-19 Xiaoze Jiang , Jing Yu , Zengchang Qin , Yingying Zhuang , Xingxing Zhang , Yue Hu , Qi Wu

We introduce Affective Visual Dialog, an emotion explanation and reasoning task as a testbed for research on understanding the formation of emotions in visually grounded conversations. The task involves three skills: (1) Dialog-based…

Computation and Language · Computer Science 2025-01-03 Kilichbek Haydarov , Xiaoqian Shen , Avinash Madasu , Mahmoud Salem , Li-Jia Li , Gamaleldin Elsayed , Mohamed Elhoseiny

Visual Dialog is a vision-language task that requires an AI agent to engage in a conversation with humans grounded in an image. It remains a challenging task since it requires the agent to fully understand a given question before making an…

Computation and Language · Computer Science 2019-12-19 Feilong Chen , Fandong Meng , Jiaming Xu , Peng Li , Bo Xu , Jie Zhou

Current vision and language tasks usually take complete visual data (e.g., raw images or videos) as input, however, practical scenarios may often consist the situations where part of the visual information becomes inaccessible due to…

Computer Vision and Pattern Recognition · Computer Science 2021-06-29 Ye Zhu , Yu Wu , Yi Yang , Yan Yan

In this paper, we introduce a novel Face-to-Face spoken dialogue model. It processes audio-visual speech from user input and generates audio-visual speech as the response, marking the initial step towards creating an avatar chatbot system…

Computer Vision and Pattern Recognition · Computer Science 2024-08-05 Se Jin Park , Chae Won Kim , Hyeongseop Rha , Minsu Kim , Joanna Hong , Jeong Hun Yeo , Yong Man Ro

Visual dialog is a vision-language task where an agent needs to answer a series of questions grounded in an image based on the understanding of the dialog history and the image. The occurrences of coreference relations in the dialog makes…

Computer Vision and Pattern Recognition · Computer Science 2022-03-08 Mingxiao Li , Marie-Francine Moens

When humans converse, what a speaker will say next significantly depends on what he sees. Unfortunately, existing dialogue models generate dialogue utterances only based on preceding textual contexts, and visual contexts are rarely…

Computation and Language · Computer Science 2021-06-01 Yuxian Meng , Shuhe Wang , Qinghong Han , Xiaofei Sun , Fei Wu , Rui Yan , Jiwei Li

Visual Dialog is a multimodal task of answering a sequence of questions grounded in an image, using the conversation history as context. It entails challenges in vision, language, reasoning, and grounding. However, studying these subtasks…

Computer Vision and Pattern Recognition · Computer Science 2019-09-20 Satwik Kottur , José M. F. Moura , Devi Parikh , Dhruv Batra , Marcus Rohrbach

Human conversation is a complex mechanism with subtle nuances. It is hence an ambitious goal to develop artificial intelligence agents that can participate fluently in a conversation. While we are still far from achieving this goal, recent…

Computer Vision and Pattern Recognition · Computer Science 2018-03-30 Unnat Jain , Svetlana Lazebnik , Alexander Schwing

Visual Dialog involves "understanding" the dialog history (what has been discussed previously) and the current question (what is asked), in addition to grounding information in the image, to generate the correct response. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2020-05-18 Shubham Agarwal , Trung Bui , Joon-Young Lee , Ioannis Konstas , Verena Rieser

In this paper, we build a visual dialogue dataset, named InfoVisDial, which provides rich informative answers in each round even with external knowledge related to the visual content. Different from existing datasets where the answer is…

Computer Vision and Pattern Recognition · Computer Science 2023-12-22 Bingbing Wen , Zhengyuan Yang , Jianfeng Wang , Zhe Gan , Bill Howe , Lijuan Wang

The intelligent dialogue system, aiming at communicating with humans harmoniously with natural language, is brilliant for promoting the advancement of human-machine interaction in the era of artificial intelligence. With the gradually…

Artificial Intelligence · Computer Science 2022-07-05 Hao Wang , Bin Guo , Yating Zeng , Yasan Ding , Chen Qiu , Ying Zhang , Lina Yao , Zhiwen Yu

Visual dialog is a challenging vision-language task in which a series of questions visually grounded by a given image are answered. To resolve the visual dialog task, a high-level understanding of various multimodal inputs (e.g., question,…

Artificial Intelligence · Computer Science 2020-10-08 Sungjin Park , Taesun Whang , Yeochan Yoon , Heuiseok Lim

What if the patterns hidden within dialogue reveal more about communication than the words themselves? We introduce Conversational DNA, a novel visual language that treats any dialogue -- whether between humans, between human and AI, or…

Human-Computer Interaction · Computer Science 2025-08-12 Baihan Lin

Building a socially intelligent agent involves many challenges, one of which is to teach the agent to speak guided by its value like a human. However, value-driven chatbots are still understudied in the area of dialogue systems. Most…

Computation and Language · Computer Science 2022-07-25 Liang Qiu , Yizhou Zhao , Jinchao Li , Pan Lu , Baolin Peng , Jianfeng Gao , Song-Chun Zhu

The Visual Dialogue task requires an agent to engage in a conversation about an image with a human. It represents an extension of the Visual Question Answering task in that the agent needs to answer a question about an image, but it needs…

Computer Vision and Pattern Recognition · Computer Science 2017-11-22 Qi Wu , Peng Wang , Chunhua Shen , Ian Reid , Anton van den Hengel

Natural human conversation is full-duplex and audio-visual: people simultaneously speak and listen while continuously interpreting and producing nonverbal cues, such as nods, smiles, and gestures. To support successful human-agent…

Computer Vision and Pattern Recognition · Computer Science 2026-05-29 Amrita Mazumdar , Seonwook Park , Rajarshi Roy , Nikhil Srihari , Shengze Wang , Yuhao Zhou , Julia Wang , Koki Nagano , Shalini De Mello

We demonstrate ViDA-MAN, a digital-human agent for multi-modal interaction, which offers realtime audio-visual responses to instant speech inquiries. Compared to traditional text or voice-based system, ViDA-MAN offers human-like…

Computer Vision and Pattern Recognition · Computer Science 2021-10-27 Tong Shen , Jiawei Zuo , Fan Shi , Jin Zhang , Liqin Jiang , Meng Chen , Zhengchen Zhang , Wei Zhang , Xiaodong He , Tao Mei

Describing images with text is a fundamental problem in vision-language research. Current studies in this domain mostly focus on single image captioning. However, in various real applications (e.g., image editing, difference interpretation,…

Computation and Language · Computer Science 2019-06-20 Hao Tan , Franck Dernoncourt , Zhe Lin , Trung Bui , Mohit Bansal
‹ Prev 1 2 3 10 Next ›