Related papers: Interactive Conversational Head Generation

Responsive Listening Head Generation: A Benchmark Dataset and Baseline

We present a new listening head generation benchmark, for synthesizing responsive feedbacks of a listener (e.g., nod, smile) during a face-to-face conversation. As the indispensable complement to talking heads generation, listening head…

Computer Vision and Pattern Recognition · Computer Science 2022-07-21 Mohan Zhou , Yalong Bai , Wei Zhang , Ting Yao , Tiejun Zhao , Tao Mei

Perceptual Conversational Head Generation with Regularized Driver and Enhanced Renderer

This paper reports our solution for ACM Multimedia ViCo 2022 Conversational Head Generation Challenge, which aims to generate vivid face-to-face conversation videos based on audio and reference images. Our solution focuses on training a…

Computer Vision and Pattern Recognition · Computer Science 2022-08-03 Ailin Huang , Zhewei Huang , Shuchang Zhou

Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline

In dyadic speaker-listener interactions, the listener's head reactions along with the speaker's head movements, constitute an important non-verbal semantic expression together. The listener Head generation task aims to synthesize responsive…

Computer Vision and Pattern Recognition · Computer Science 2023-07-20 Zhigang Chang , Weitai Hu , Qing Yang , Shibao Zheng

MANGO:Natural Multi-speaker 3D Talking Head Generation via 2D-Lifted Enhancement

Current audio-driven 3D head generation methods mainly focus on single-speaker scenarios, lacking natural, bidirectional listen-and-speak interaction. Achieving seamless conversational behavior, where speaking and listening states…

Computer Vision and Pattern Recognition · Computer Science 2026-01-06 Lei Zhu , Lijian Lin , Ye Zhu , Jiahao Wu , Xuehan Hou , Yu Li , Yunfei Liu , Jie Chen

DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations

In face-to-face conversations, individuals need to switch between speaking and listening roles seamlessly. Existing 3D talking head generation models focus solely on speaking or listening, neglecting the natural dynamics of interactive…

Computer Vision and Pattern Recognition · Computer Science 2025-05-27 Ziqiao Peng , Yanbo Fan , Haoyu Wu , Xuan Wang , Hongyan Liu , Jun He , Zhaoxin Fan

Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation

In this paper, we propose a novel text-based talking-head video generation framework that synthesizes high-fidelity facial expressions and head motions in accordance with contextual sentiments as well as speech rhythm and pauses. To be…

Computer Vision and Pattern Recognition · Computer Science 2021-05-10 Lincheng Li , Suzhen Wang , Zhimeng Zhang , Yu Ding , Yixing Zheng , Xin Yu , Changjie Fan

Active Listener: Continuous Generation of Listener's Head Motion Response in Dyadic Interactions

A key component of dyadic spoken interactions is the contextually relevant non-verbal gestures, such as head movements that reflect a listener's response to the interlocutor's speech. Although significant progress has been made in the…

Robotics · Computer Science 2024-10-01 Bishal Ghosh , Emma Li , Tanaya Guha

TAVID: Text-Driven Audio-Visual Interactive Dialogue Generation

The objective of this paper is to jointly synthesize interactive videos and conversational speech from text and reference images. With the ultimate goal of building human-like conversational systems, recent studies have explored talking or…

Computer Vision and Pattern Recognition · Computer Science 2025-12-24 Ji-Hoon Kim , Junseok Ahn , Doyeop Kwak , Joon Son Chung , Shinji Watanabe

A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation

Virtual humans have gained considerable attention in numerous industries, e.g., entertainment and e-commerce. As a core technology, synthesizing photorealistic face frames from target speech and facial identity has been actively studied…

Sound · Computer Science 2023-05-01 Bo-Kyeong Kim , Jaemin Kang , Daeun Seo , Hancheol Park , Shinkook Choi , Hyoung-Kyu Song , Hyungshin Kim , Sungsu Lim

Two can play this Game: Visual Dialog with Discriminative Question Generation and Answering

Human conversation is a complex mechanism with subtle nuances. It is hence an ambitious goal to develop artificial intelligence agents that can participate fluently in a conversation. While we are still far from achieving this goal, recent…

Computer Vision and Pattern Recognition · Computer Science 2018-03-30 Unnat Jain , Svetlana Lazebnik , Alexander Schwing

ECHO: Towards Emotionally Appropriate and Contextually Aware Interactive Head Generation

In natural face-to-face interaction, participants seamlessly alternate between speaking and listening, producing facial behaviors (FBs) that are finely informed by long-range context and naturally exhibit contextual appropriateness and…

Computer Vision and Pattern Recognition · Computer Science 2026-03-19 Xiangyu Kong , Xiaoyu Jin , Yihan Pan , Haoqin Sun , Hengde Zhu , Xiaoming Xu , Xiaoming Wei , Lu Liu , Siyang Song

Proactive Human-Machine Conversation with Explicit Conversation Goals

Though great progress has been made for human-machine conversation, current dialogue system is still in its infancy: it usually converses passively and utters words more as a matter of response, rather than on its own initiatives. In this…

Computation and Language · Computer Science 2019-11-11 Wenquan Wu , Zhen Guo , Xiangyang Zhou , Hua Wu , Xiyuan Zhang , Rongzhong Lian , Haifeng Wang

Leveraging WaveNet for Dynamic Listening Head Modeling from Speech

The creation of listener facial responses aims to simulate interactive communication feedback from a listener during a face-to-face conversation. Our goal is to generate believable videos of listeners' heads that respond authentically to a…

Computer Vision and Pattern Recognition · Computer Science 2024-09-10 Minh-Duc Nguyen , Hyung-Jeong Yang , Seung-Won Kim , Ji-Eun Shin , Soo-Hyung Kim

PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable

Pre-training models have been proved effective for a wide range of natural language processing tasks. Inspired by this, we propose a novel dialogue generation pre-training framework to support various kinds of conversations, including…

Computation and Language · Computer Science 2020-05-01 Siqi Bao , Huang He , Fan Wang , Hua Wu , Haifeng Wang

EmoVOCA: Speech-Driven Emotional 3D Talking Heads

The domain of 3D talking head generation has witnessed significant progress in recent years. A notable challenge in this field consists in blending speech-related motions with expression dynamics, which is primarily caused by the lack of…

Computer Vision and Pattern Recognition · Computer Science 2025-06-10 Federico Nocentini , Claudio Ferrari , Stefano Berretti

Talking Together: Synthesizing Co-Located 3D Conversations from Audio

We tackle the challenging task of generating complete 3D facial animations for two interacting, co-located participants from a mixed audio stream. While existing methods often produce disembodied "talking heads" akin to a video conference…

Computer Vision and Pattern Recognition · Computer Science 2026-03-10 Mengyi Shan , Shouchieh Chang , Ziqian Bai , Shichen Liu , Yinda Zhang , Luchuan Song , Rohit Pandey , Sean Fanello , Zeng Huang

Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation

Talking head generation creates lifelike avatars from static portraits for virtual communication and content creation. However, current models do not yet convey the feeling of truly interactive communication, often generating one-way…

Machine Learning · Computer Science 2026-01-05 Taekyung Ki , Sangwon Jang , Jaehyeong Jo , Jaehong Yoon , Sung Ju Hwang

VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior

Audio-driven talking head generation has drawn much attention in recent years, and many efforts have been made in lip-sync, expressive facial expressions, natural head pose generation, and high video quality. However, no model has yet led…

Computer Vision and Pattern Recognition · Computer Science 2023-12-08 Xusen Sun , Longhao Zhang , Hao Zhu , Peng Zhang , Bang Zhang , Xinya Ji , Kangneng Zhou , Daiheng Gao , Liefeng Bo , Xun Cao

Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors

In this paper, we introduce a simple and novel framework for one-shot audio-driven talking head generation. Unlike prior works that require additional driving sources for controlled synthesis in a deterministic manner, we instead…

Graphics · Computer Science 2022-12-09 Zhentao Yu , Zixin Yin , Deyu Zhou , Duomin Wang , Finn Wong , Baoyuan Wang

Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion

Talking head generation is to synthesize a lip-synchronized talking head video by inputting an arbitrary face image and corresponding audio clips. Existing methods ignore not only the interaction and relationship of cross-modal information,…

Computer Vision and Pattern Recognition · Computer Science 2024-11-01 Sen Chen , Zhilei Liu , Jiaxing Liu , Longbiao Wang