Related papers: DiffMotion: Speech-Driven Gesture Synthesis Using …

A conversational gesture synthesis system based on emotions and semantics

Along with the explosion of large language models, improvements in speech synthesis, advancements in hardware, and the evolution of computer graphics, the current bottleneck in creating digital humans lies in generating character movements…

Human-Computer Interaction · Computer Science 2026-01-30 Thanh Hoang-Minh

DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models

The art of communication beyond speech there are gestures. The automatic co-speech gesture generation draws much attention in computer animation. It is a challenging task due to the diversity of gestures and the difficulty of matching the…

Human-Computer Interaction · Computer Science 2023-05-09 Sicheng Yang , Zhiyong Wu , Minglei Li , Zhensong Zhang , Lei Hao , Weihong Bao , Ming Cheng , Long Xiao

ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis

Gestures play a key role in human communication. Recent methods for co-speech gesture generation, while managing to generate beat-aligned motions, struggle generating gestures that are semantically aligned with the utterance. Compared to…

Computer Vision and Pattern Recognition · Computer Science 2024-03-27 Muhammad Hamza Mughal , Rishabh Dabral , Ikhsanul Habibie , Lucia Donatelli , Marc Habermann , Christian Theobalt

UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons

The automatic co-speech gesture generation draws much attention in computer animation. Previous works designed network structures on individual datasets, which resulted in a lack of data volume and generalizability across different motion…

Human-Computer Interaction · Computer Science 2023-09-14 Sicheng Yang , Zilin Wang , Zhiyong Wu , Minglei Li , Zhensong Zhang , Qiaochu Huang , Lei Hao , Songcen Xu , Xiaofei Wu , changpeng yang , Zonghong Dai

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis

With read-aloud speech synthesis achieving high naturalness scores, there is a growing research interest in synthesising spontaneous speech. However, human spontaneous face-to-face conversation has both spoken and non-verbal aspects (here,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-15 Shivam Mehta , Siyang Wang , Simon Alexanderson , Jonas Beskow , Éva Székely , Gustav Eje Henter

DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation

We propose DiffSHEG, a Diffusion-based approach for Speech-driven Holistic 3D Expression and Gesture generation with arbitrary length. While previous works focused on co-speech gesture or expression generation individually, the joint…

Sound · Computer Science 2024-04-09 Junming Chen , Yunfei Liu , Jianan Wang , Ailing Zeng , Yu Li , Qifeng Chen

SIGGesture: Generalized Co-Speech Gesture Synthesis via Semantic Injection with Large-Scale Pre-Training Diffusion Models

The automated synthesis of high-quality 3D gestures from speech is of significant value in virtual humans and gaming. Previous methods focus on synthesizing gestures that are synchronized with speech rhythm, yet they frequently overlook the…

Human-Computer Interaction · Computer Science 2024-09-24 Qingrong Cheng , Xu Li , Xinghui Fu , Fei Xia , Zhongqian Sun

Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation

Animating virtual avatars to make co-speech gestures facilitates various applications in human-machine interaction. The existing methods mainly rely on generative adversarial networks (GANs), which typically suffer from notorious mode…

Computer Vision and Pattern Recognition · Computer Science 2023-03-21 Lingting Zhu , Xian Liu , Xuanyu Liu , Rui Qian , Ziwei Liu , Lequan Yu

Diffusion Motion: Generate Text-Guided 3D Human Motion by Diffusion Model

We propose a simple and novel method for generating 3D human motion from complex natural language sentences, which describe different velocity, direction and composition of all kinds of actions. Different from existing methods that use…

Computer Vision and Pattern Recognition · Computer Science 2023-04-17 Zhiyuan Ren , Zhihong Pan , Xin Zhou , Le Kang

Unifying Human Motion Synthesis and Style Transfer with Denoising Diffusion Probabilistic Models

Generating realistic motions for digital humans is a core but challenging part of computer animations and games, as human motions are both diverse in content and rich in styles. While the latest deep learning approaches have made…

Computer Vision and Pattern Recognition · Computer Science 2022-12-19 Ziyi Chang , Edmund J. C. Findlay , Haozheng Zhang , Hubert P. H. Shum

DIDiffGes: Decoupled Semi-Implicit Diffusion Models for Real-time Gesture Generation from Speech

Diffusion models have demonstrated remarkable synthesis quality and diversity in generating co-speech gestures. However, the computationally intensive sampling steps associated with diffusion models hinder their practicality in real-world…

Graphics · Computer Science 2025-03-24 Yongkang Cheng , Shaoli Huang , Xuelin Chen , Jifeng Ning , Mingming Gong

Listen, Denoise, Action! Audio-Driven Motion Synthesis with Diffusion Models

Diffusion models have experienced a surge of interest as highly expressive yet efficiently trainable probabilistic models. We show that these models are an excellent fit for synthesising human motion that co-occurs with audio, e.g., dancing…

Machine Learning · Computer Science 2023-05-17 Simon Alexanderson , Rajmund Nagy , Jonas Beskow , Gustav Eje Henter

Conversational Co-Speech Gesture Generation via Modeling Dialog Intention, Emotion, and Context with Diffusion Models

Audio-driven co-speech human gesture generation has made remarkable advancements recently. However, most previous works only focus on single person audio-driven gesture generation. We aim at solving the problem of conversational co-speech…

Human-Computer Interaction · Computer Science 2024-01-12 Haiwei Xue , Sicheng Yang , Zhensong Zhang , Zhiyong Wu , Minglei Li , Zonghong Dai , Helen Meng

EMoG: Synthesizing Emotive Co-speech 3D Gesture with Diffusion Model

Although previous co-speech gesture generation methods are able to synthesize motions in line with speech content, it is still not enough to handle diverse and complicated motion distribution. The key challenges are: 1) the one-to-many…

Computer Vision and Pattern Recognition · Computer Science 2023-06-21 Lianying Yin , Yijun Wang , Tianyu He , Jinming Liu , Wei Zhao , Bohan Li , Xin Jin , Jianxin Lin

A Unified Editing Method for Co-Speech Gesture Generation via Diffusion Inversion

Diffusion models have shown great success in generating high-quality co-speech gestures for interactive humanoid robots or digital avatars from noisy input with the speech audio or text as conditions. However, they rarely focus on providing…

Human-Computer Interaction · Computer Science 2024-04-04 Zeyu Zhao , Nan Gao , Zhi Zeng , Guixuan Zhang , Jie Liu , Shuwu Zhang

EasyGenNet: An Efficient Framework for Audio-Driven Gesture Video Generation Based on Diffusion Model

Audio-driven cospeech video generation typically involves two stages: speech-to-gesture and gesture-to-video. While significant advances have been made in speech-to-gesture generation, synthesizing natural expressions and gestures remains…

Computer Vision and Pattern Recognition · Computer Science 2025-04-14 Renda Li , Xiaohua Qi , Qiang Ling , Jun Yu , Ziyi Chen , Peng Chang , Mei HanJing Xiao

DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures

Audio-driven talking video generation has advanced significantly, but existing methods often depend on video-to-video translation techniques and traditional generative networks like GANs and they typically generate taking heads and…

Computer Vision and Pattern Recognition · Computer Science 2024-09-13 Steven Hogue , Chenxu Zhang , Hamza Daruger , Yapeng Tian , Xiaohu Guo

Moving fast and slow: Analysis of representations and post-processing in speech-driven automatic gesture generation

This paper presents a novel framework for speech-driven gesture production, applicable to virtual agents to enhance human-computer interaction. Specifically, we extend recent deep-learning-based, data-driven methods for speech-driven…

Computer Vision and Pattern Recognition · Computer Science 2021-04-07 Taras Kucherenko , Dai Hasegawa , Naoshi Kaneko , Gustav Eje Henter , Hedvig Kjellström

Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model

Co-speech gestures, if presented in the lively form of videos, can achieve superior visual effects in human-machine interaction. While previous works mostly generate structural human skeletons, resulting in the omission of appearance…

Computer Vision and Pattern Recognition · Computer Science 2024-04-03 Xu He , Qiaochu Huang , Zhensong Zhang , Zhiwei Lin , Zhiyong Wu , Sicheng Yang , Minglei Li , Zhiyi Chen , Songcen Xu , Xiaofei Wu

Co-speech Gesture Video Generation via Motion-Based Graph Retrieval

Synthesizing synchronized and natural co-speech gesture videos remains a formidable challenge. Recent approaches have leveraged motion graphs to harness the potential of existing video data. To retrieve an appropriate trajectory from the…

Computer Vision and Pattern Recognition · Computer Science 2025-12-03 Yafei Song , Peng Zhang , Bang Zhang