Related papers: Facial Keypoint Sequence Generation from Audio

Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion

We propose an audio-driven talking-head method to generate photo-realistic talking-head videos from a single reference image. In this work, we tackle two key challenges: (i) producing natural head motions that match speech prosody, and (ii)…

Computer Vision and Pattern Recognition · Computer Science 2021-07-21 Suzhen Wang , Lincheng Li , Yu Ding , Changjie Fan , Xin Yu

SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory

The challenge of talking face generation from speech lies in aligning two different modal information, audio and video, such that the mouth region corresponds to input audio. Previous methods either exploit audio-visual representation…

Computer Vision and Pattern Recognition · Computer Science 2022-11-04 Se Jin Park , Minsu Kim , Joanna Hong , Jeongsoo Choi , Yong Man Ro

Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose

Real-world talking faces often accompany with natural head movement. However, most existing talking face video generation methods only consider facial animation with fixed head pose. In this paper, we address this problem by proposing a…

Computer Vision and Pattern Recognition · Computer Science 2020-03-06 Ran Yi , Zipeng Ye , Juyong Zhang , Hujun Bao , Yong-Jin Liu

JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing

Significant progress has been made in talking-face video generation research; however, precise lip-audio synchronization and high visual quality remain challenging in editing lip shapes based on input audio. This paper introduces JoyGen, a…

Computer Vision and Pattern Recognition · Computer Science 2025-01-06 Qili Wang , Dajiang Wu , Zihang Xu , Junshi Huang , Jun Lv

Robust One Shot Audio to Video Generation

Audio to Video generation is an interesting problem that has numerous applications across industry verticals including film making, multi-media, marketing, education and others. High-quality video generation with expressive facial movements…

Computer Vision and Pattern Recognition · Computer Science 2020-12-16 Neeraj Kumar , Srishti Goel , Ankur Narang , Mujtaba Hasan

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation

Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech. This is a challenging task because face appearance variation and semantics of speech are coupled together in the subtle movements of…

Computer Vision and Pattern Recognition · Computer Science 2019-04-24 Hang Zhou , Yu Liu , Ziwei Liu , Ping Luo , Xiaogang Wang

RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network

Person-generic audio-driven face generation is a challenging task in computer vision. Previous methods have achieved remarkable progress in audio-visual synchronization, but there is still a significant gap between current results and…

Computer Vision and Pattern Recognition · Computer Science 2024-08-09 Xiaozhong Ji , Chuming Lin , Zhonggan Ding , Ying Tai , Junwei Zhu , Xiaobin Hu , Donghao Luo , Yanhao Ge , Chengjie Wang

One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning

Audio-driven one-shot talking face generation methods are usually trained on video resources of various persons. However, their created videos often suffer unnatural mouth shapes and asynchronous lips because those methods struggle to learn…

Computer Vision and Pattern Recognition · Computer Science 2021-12-07 Suzhen Wang , Lincheng Li , Yu Ding , Xin Yu

KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation

Current audio-driven facial animation methods achieve impressive results for short videos but suffer from error accumulation and identity drift when extended to longer durations. Existing methods attempt to mitigate this through external…

Computer Vision and Pattern Recognition · Computer Science 2025-03-20 Antoni Bigata , Michał Stypułkowski , Rodrigo Mira , Stella Bounareli , Konstantinos Vougioukas , Zoe Landgraf , Nikita Drobyshev , Maciej Zieba , Stavros Petridis , Maja Pantic

End-to-End Speech-Driven Facial Animation with Temporal GANs

Speech-driven facial animation is the process which uses speech signals to automatically synthesize a talking character. The majority of work in this domain creates a mapping from audio features to visual features. This often requires…

Audio and Speech Processing · Electrical Eng. & Systems 2018-07-20 Konstantinos Vougioukas , Stavros Petridis , Maja Pantic

Controllable Talking Face Generation by Implicit Facial Keypoints Editing

Audio-driven talking face generation has garnered significant interest within the domain of digital human research. Existing methods are encumbered by intricate model architectures that are intricately dependent on each other, complicating…

Computer Vision and Pattern Recognition · Computer Science 2024-11-08 Dong Zhao , Jiaying Shi , Wenjun Li , Shudong Wang , Shenghui Xu , Zhaoming Pan

Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video

Synthesizing realistic videos according to a given speech is still an open challenge. Previous works have been plagued by issues such as inaccurate lip shape generation and poor image quality. The key reason is that only motions and…

Computer Vision and Pattern Recognition · Computer Science 2023-09-12 Xiuzhe Wu , Pengfei Hu , Yang Wu , Xiaoyang Lyu , Yan-Pei Cao , Ying Shan , Wenming Yang , Zhongqian Sun , Xiaojuan Qi

KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding

We present a novel approach for synthesizing 3D facial motions from audio sequences using key motion embeddings. Despite recent advancements in data-driven techniques, accurately mapping between audio signals and 3D facial meshes remains…

Computer Vision and Pattern Recognition · Computer Science 2024-09-04 Zhihao Xu , Shengjie Gong , Jiapeng Tang , Lingyu Liang , Yining Huang , Haojie Li , Shuangping Huang

Mask-Free Audio-driven Talking Face Generation for Enhanced Visual Quality and Identity Preservation

Audio-Driven Talking Face Generation aims at generating realistic videos of talking faces, focusing on accurate audio-lip synchronization without deteriorating any identity-related visual details. Recent state-of-the-art methods are based…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 Dogucan Yaman , Fevziye Irem Eyiokur , Leonard Bärmann , Hazım Kemal Ekenel , Alexander Waibel

Audio-Driven Talking Face Generation with Diverse yet Realistic Facial Animations

Audio-driven talking face generation, which aims to synthesize talking faces with realistic facial animations (including accurate lip movements, vivid facial expression details and natural head poses) corresponding to the audio, has…

Computer Vision and Pattern Recognition · Computer Science 2023-04-19 Rongliang Wu , Yingchen Yu , Fangneng Zhan , Jiahui Zhang , Xiaoqin Zhang , Shijian Lu

Audio2Face: Generating Speech/Face Animation from Single Audio with Attention-Based Bidirectional LSTM Networks

We propose an end to end deep learning approach for generating real-time facial animation from just audio. Specifically, our deep architecture employs deep bidirectional long short-term memory network and attention mechanism to discover the…

Machine Learning · Computer Science 2019-05-28 Guanzhong Tian , Yi Yuan , Yong liu

StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation

We propose StyleTalker, a novel audio-driven talking head generation model that can synthesize a video of a talking person from a single reference image with accurately audio-synced lip shapes, realistic head poses, and eye blinks.…

Computer Vision and Pattern Recognition · Computer Science 2024-03-18 Dongchan Min , Minyoung Song , Eunji Ko , Sung Ju Hwang

EMO2: End-Effector Guided Audio-Driven Avatar Video Generation

In this paper, we propose a novel audio-driven talking head method capable of simultaneously generating highly expressive facial expressions and hand gestures. Unlike existing methods that focus on generating full-body or half-body poses,…

Computer Vision and Pattern Recognition · Computer Science 2025-01-22 Linrui Tian , Siqi Hu , Qi Wang , Bang Zhang , Liefeng Bo

Realistic Speech-Driven Facial Animation with GANs

Speech-driven facial animation is the process that automatically synthesizes talking characters based on speech signals. The majority of work in this domain creates a mapping from audio features to visual features. This approach often…

Computer Vision and Pattern Recognition · Computer Science 2019-06-18 Konstantinos Vougioukas , Stavros Petridis , Maja Pantic

A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis

Audio driven talking head synthesis is a challenging task that attracts increasing attention in recent years. Although existing methods based on 2D landmarks or 3D face models can synthesize accurate lip synchronization and rhythmic head…

Computer Vision and Pattern Recognition · Computer Science 2022-10-10 Yichen Han , Ya Li , Yingming Gao , Jinlong Xue , Songpo Wang , Lei Yang