English
Related papers

Related papers: DiffMotion: Speech-Driven Gesture Synthesis Using …

200 papers

Along with the explosion of large language models, improvements in speech synthesis, advancements in hardware, and the evolution of computer graphics, the current bottleneck in creating digital humans lies in generating character movements…

Human-Computer Interaction · Computer Science 2026-01-30 Thanh Hoang-Minh

The art of communication beyond speech there are gestures. The automatic co-speech gesture generation draws much attention in computer animation. It is a challenging task due to the diversity of gestures and the difficulty of matching the…

Human-Computer Interaction · Computer Science 2023-05-09 Sicheng Yang , Zhiyong Wu , Minglei Li , Zhensong Zhang , Lei Hao , Weihong Bao , Ming Cheng , Long Xiao

Gestures play a key role in human communication. Recent methods for co-speech gesture generation, while managing to generate beat-aligned motions, struggle generating gestures that are semantically aligned with the utterance. Compared to…

Computer Vision and Pattern Recognition · Computer Science 2024-03-27 Muhammad Hamza Mughal , Rishabh Dabral , Ikhsanul Habibie , Lucia Donatelli , Marc Habermann , Christian Theobalt

The automatic co-speech gesture generation draws much attention in computer animation. Previous works designed network structures on individual datasets, which resulted in a lack of data volume and generalizability across different motion…

Human-Computer Interaction · Computer Science 2023-09-14 Sicheng Yang , Zilin Wang , Zhiyong Wu , Minglei Li , Zhensong Zhang , Qiaochu Huang , Lei Hao , Songcen Xu , Xiaofei Wu , changpeng yang , Zonghong Dai

With read-aloud speech synthesis achieving high naturalness scores, there is a growing research interest in synthesising spontaneous speech. However, human spontaneous face-to-face conversation has both spoken and non-verbal aspects (here,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-15 Shivam Mehta , Siyang Wang , Simon Alexanderson , Jonas Beskow , Éva Székely , Gustav Eje Henter

We propose DiffSHEG, a Diffusion-based approach for Speech-driven Holistic 3D Expression and Gesture generation with arbitrary length. While previous works focused on co-speech gesture or expression generation individually, the joint…

Sound · Computer Science 2024-04-09 Junming Chen , Yunfei Liu , Jianan Wang , Ailing Zeng , Yu Li , Qifeng Chen

The automated synthesis of high-quality 3D gestures from speech is of significant value in virtual humans and gaming. Previous methods focus on synthesizing gestures that are synchronized with speech rhythm, yet they frequently overlook the…

Human-Computer Interaction · Computer Science 2024-09-24 Qingrong Cheng , Xu Li , Xinghui Fu , Fei Xia , Zhongqian Sun

Animating virtual avatars to make co-speech gestures facilitates various applications in human-machine interaction. The existing methods mainly rely on generative adversarial networks (GANs), which typically suffer from notorious mode…

Computer Vision and Pattern Recognition · Computer Science 2023-03-21 Lingting Zhu , Xian Liu , Xuanyu Liu , Rui Qian , Ziwei Liu , Lequan Yu

We propose a simple and novel method for generating 3D human motion from complex natural language sentences, which describe different velocity, direction and composition of all kinds of actions. Different from existing methods that use…

Computer Vision and Pattern Recognition · Computer Science 2023-04-17 Zhiyuan Ren , Zhihong Pan , Xin Zhou , Le Kang

Generating realistic motions for digital humans is a core but challenging part of computer animations and games, as human motions are both diverse in content and rich in styles. While the latest deep learning approaches have made…

Computer Vision and Pattern Recognition · Computer Science 2022-12-19 Ziyi Chang , Edmund J. C. Findlay , Haozheng Zhang , Hubert P. H. Shum

Diffusion models have demonstrated remarkable synthesis quality and diversity in generating co-speech gestures. However, the computationally intensive sampling steps associated with diffusion models hinder their practicality in real-world…

Graphics · Computer Science 2025-03-24 Yongkang Cheng , Shaoli Huang , Xuelin Chen , Jifeng Ning , Mingming Gong

Diffusion models have experienced a surge of interest as highly expressive yet efficiently trainable probabilistic models. We show that these models are an excellent fit for synthesising human motion that co-occurs with audio, e.g., dancing…

Machine Learning · Computer Science 2023-05-17 Simon Alexanderson , Rajmund Nagy , Jonas Beskow , Gustav Eje Henter

Audio-driven co-speech human gesture generation has made remarkable advancements recently. However, most previous works only focus on single person audio-driven gesture generation. We aim at solving the problem of conversational co-speech…

Human-Computer Interaction · Computer Science 2024-01-12 Haiwei Xue , Sicheng Yang , Zhensong Zhang , Zhiyong Wu , Minglei Li , Zonghong Dai , Helen Meng

Although previous co-speech gesture generation methods are able to synthesize motions in line with speech content, it is still not enough to handle diverse and complicated motion distribution. The key challenges are: 1) the one-to-many…

Computer Vision and Pattern Recognition · Computer Science 2023-06-21 Lianying Yin , Yijun Wang , Tianyu He , Jinming Liu , Wei Zhao , Bohan Li , Xin Jin , Jianxin Lin

Diffusion models have shown great success in generating high-quality co-speech gestures for interactive humanoid robots or digital avatars from noisy input with the speech audio or text as conditions. However, they rarely focus on providing…

Human-Computer Interaction · Computer Science 2024-04-04 Zeyu Zhao , Nan Gao , Zhi Zeng , Guixuan Zhang , Jie Liu , Shuwu Zhang

Audio-driven cospeech video generation typically involves two stages: speech-to-gesture and gesture-to-video. While significant advances have been made in speech-to-gesture generation, synthesizing natural expressions and gestures remains…

Computer Vision and Pattern Recognition · Computer Science 2025-04-14 Renda Li , Xiaohua Qi , Qiang Ling , Jun Yu , Ziyi Chen , Peng Chang , Mei HanJing Xiao

Audio-driven talking video generation has advanced significantly, but existing methods often depend on video-to-video translation techniques and traditional generative networks like GANs and they typically generate taking heads and…

Computer Vision and Pattern Recognition · Computer Science 2024-09-13 Steven Hogue , Chenxu Zhang , Hamza Daruger , Yapeng Tian , Xiaohu Guo

This paper presents a novel framework for speech-driven gesture production, applicable to virtual agents to enhance human-computer interaction. Specifically, we extend recent deep-learning-based, data-driven methods for speech-driven…

Computer Vision and Pattern Recognition · Computer Science 2021-04-07 Taras Kucherenko , Dai Hasegawa , Naoshi Kaneko , Gustav Eje Henter , Hedvig Kjellström

Co-speech gestures, if presented in the lively form of videos, can achieve superior visual effects in human-machine interaction. While previous works mostly generate structural human skeletons, resulting in the omission of appearance…

Computer Vision and Pattern Recognition · Computer Science 2024-04-03 Xu He , Qiaochu Huang , Zhensong Zhang , Zhiwei Lin , Zhiyong Wu , Sicheng Yang , Minglei Li , Zhiyi Chen , Songcen Xu , Xiaofei Wu

Synthesizing synchronized and natural co-speech gesture videos remains a formidable challenge. Recent approaches have leveraged motion graphs to harness the potential of existing video data. To retrieve an appropriate trajectory from the…

Computer Vision and Pattern Recognition · Computer Science 2025-12-03 Yafei Song , Peng Zhang , Bang Zhang
‹ Prev 1 2 3 10 Next ›