Related papers: Data standardization for robust lip sync

LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization

In this paper, we present a video-based learning framework for animating personalized 3D talking faces from audio. We introduce two training-time data normalizations that significantly improve data sample efficiency. First, we isolate and…

Computer Vision and Pattern Recognition · Computer Science 2021-06-09 Avisek Lahiri , Vivek Kwatra , Christian Frueh , John Lewis , Chris Bregler

Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization

The task of lip synchronization (lip-sync) seeks to match the lips of human faces with different audio. It has various applications in the film industry as well as for creating virtual avatars and for video conferencing. This is a…

Computer Vision and Pattern Recognition · Computer Science 2023-08-21 Soumik Mukhopadhyay , Saksham Suri , Ravi Teja Gadde , Abhinav Shrivastava

Style-Preserving Lip Sync via Audio-Aware Style Reference

Audio-driven lip sync has recently drawn significant attention due to its widespread application in the multimedia domain. Individuals exhibit distinct lip shapes when speaking the same utterance, attributed to the unique speaking styles of…

Computer Vision and Pattern Recognition · Computer Science 2025-06-19 Weizhi Zhong , Jichang Li , Yinqi Cai , Ming Li , Feng Gao , Liang Lin , Guanbin Li

Audio-driven Talking Face Generation with Stabilized Synchronization Loss

Talking face generation aims to create realistic videos with accurate lip synchronization and high visual quality, using given audio and reference video while preserving identity and visual characteristics. In this paper, we start by…

Computer Vision and Pattern Recognition · Computer Science 2024-07-19 Dogucan Yaman , Fevziye Irem Eyiokur , Leonard Bärmann , Hazim Kemal Ekenel , Alexander Waibel

KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution

Lip synchronization, known as the task of aligning lip movements in an existing video with new input audio, is typically framed as a simpler variant of audio-driven facial animation. However, as well as suffering from the usual issues in…

Computer Vision and Pattern Recognition · Computer Science 2025-05-02 Antoni Bigata , Rodrigo Mira , Stella Bounareli , Michał Stypułkowski , Konstantinos Vougioukas , Stavros Petridis , Maja Pantic

UniSync: Towards Generalizable and High-Fidelity Lip Synchronization for Challenging Scenarios

Lip synchronization aims to generate realistic talking videos that match given audio, which is essential for high-quality video dubbing. However, current methods have fundamental drawbacks: mask-based approaches suffer from local color…

Computer Vision and Pattern Recognition · Computer Science 2026-03-05 Ruidi Fan , Yang Zhou , Siyuan Wang , Tian Yu , Yutong Jiang , Xusheng Liu

Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement

We aim to edit the lip movements in talking video according to the given speech while preserving the personal identity and visual details. The task can be decomposed into two sub-problems: (1) speech-driven lip motion generation and (2)…

Computer Vision and Pattern Recognition · Computer Science 2024-06-18 Runyi Yu , Tianyu He , Ailing Zhang , Yuchi Wang , Junliang Guo , Xu Tan , Chang Liu , Jie Chen , Jiang Bian

A Neural Lip-Sync Framework for Synthesizing Photorealistic Virtual News Anchors

Lip sync has emerged as a promising technique for generating mouth movements from audio signals. However, synthesizing a high-resolution and photorealistic virtual news anchor is still challenging. Lack of natural appearance, visual…

Computer Vision and Pattern Recognition · Computer Science 2021-05-06 Ruobing Zheng , Zhou Zhu , Bo Song , Changjiang Ji

SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory

The challenge of talking face generation from speech lies in aligning two different modal information, audio and video, such that the mouth region corresponds to input audio. Previous methods either exploit audio-visual representation…

Computer Vision and Pattern Recognition · Computer Science 2022-11-04 Se Jin Park , Minsu Kim , Joanna Hong , Jeongsoo Choi , Yong Man Ro

Landmark-Guided Cross-Speaker Lip Reading with Mutual Information Regularization

Lip reading, the process of interpreting silent speech from visual lip movements, has gained rising attention for its wide range of realistic applications. Deep learning approaches greatly improve current lip reading systems. However, lip…

Artificial Intelligence · Computer Science 2024-05-03 Linzhi Wu , Xingyu Zhang , Yakun Zhang , Changyan Zheng , Tiejun Liu , Liang Xie , Ye Yan , Erwei Yin

OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers

Lip synchronization is the task of aligning a speaker's lip movements in video with corresponding speech audio, and it is essential for creating realistic, expressive video content. However, existing methods often rely on reference frames…

Computer Vision and Pattern Recognition · Computer Science 2025-09-19 Ziqiao Peng , Jiwen Liu , Haoxian Zhang , Xiaoqiang Liu , Songlin Tang , Pengfei Wan , Di Zhang , Hongyan Liu , Jun He

Lips Are Lying: Spotting the Temporal Inconsistency between Audio and Visual in Lip-Syncing DeepFakes

In recent years, DeepFake technology has achieved unprecedented success in high-quality video synthesis, but these methods also pose potential and severe security threats to humanity. DeepFake can be bifurcated into entertainment…

Computer Vision and Pattern Recognition · Computer Science 2024-10-29 Weifeng Liu , Tianyi She , Jiawei Liu , Boheng Li , Dongyu Yao , Ziyou Liang , Run Wang

LPIPS-AttnWav2Lip: Generic Audio-Driven lip synchronization for Talking Head Generation in the Wild

Researchers have shown a growing interest in Audio-driven Talking Head Generation. The primary challenge in talking head generation is achieving audio-visual coherence between the lips and the audio, known as lip synchronization. This paper…

Sound · Computer Science 2026-02-03 Zhipeng Chen , Xinheng Wang , Lun Xie , Haijie Yuan , Hang Pan

A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild

In this work, we investigate the problem of lip-syncing a talking face video of an arbitrary identity to match a target speech segment. Current works excel at producing accurate lip movements on a static image or videos of specific people…

Computer Vision and Pattern Recognition · Computer Science 2020-08-25 K R Prajwal , Rudrabha Mukhopadhyay , Vinay Namboodiri , C V Jawahar

Towards Realistic Visual Dubbing with Heterogeneous Sources

The task of few-shot visual dubbing focuses on synchronizing the lip movements with arbitrary speech input for any talking head video. Albeit moderate improvements in current approaches, they commonly require high-quality homologous data…

Computer Vision and Pattern Recognition · Computer Science 2022-01-19 Tianyi Xie , Liucheng Liao , Cheng Bi , Benlai Tang , Xiang Yin , Jianfei Yang , Mingjie Wang , Jiali Yao , Yang Zhang , Zejun Ma

LipSody: Lip-to-Speech Synthesis with Enhanced Prosody Consistency

Lip-to-speech synthesis aims to generate speech audio directly from silent facial video by reconstructing linguistic content from lip movements, providing valuable applications in situations where audio signals are unavailable or degraded.…

Sound · Computer Science 2026-02-03 Jaejun Lee , Yoori Oh , Kyogu Lee

On the Audio-visual Synchronization for Lip-to-Speech Synthesis

Most lip-to-speech (LTS) synthesis models are trained and evaluated under the assumption that the audio-video pairs in the dataset are perfectly synchronized. In this work, we show that the commonly used audio-visual datasets, such as GRID,…

Sound · Computer Science 2023-03-02 Zhe Niu , Brian Mak

SyncDiff: Diffusion-based Talking Head Synthesis with Bottlenecked Temporal Visual Prior for Improved Synchronization

Talking head synthesis, also known as speech-to-lip synthesis, reconstructs the facial motions that align with the given audio tracks. The synthesized videos are evaluated on mainly two aspects, lip-speech synchronization and image…

Machine Learning · Computer Science 2025-03-18 Xulin Fan , Heting Gao , Ziyi Chen , Peng Chang , Mei Han , Mark Hasegawa-Johnson

FluentLip: A Phonemes-Based Two-stage Approach for Audio-Driven Lip Synthesis with Optical Flow Consistency

Generating consecutive images of lip movements that align with a given speech in audio-driven lip synthesis is a challenging task. While previous studies have made strides in synchronization and visual quality, lip intelligibility and video…

Computer Vision and Pattern Recognition · Computer Science 2025-04-08 Shiyan Liu , Rui Qu , Yan Jin

Lip Movements Generation at a Glance

Cross-modality generation is an emerging topic that aims to synthesize data in one modality based on information in a different modality. In this paper, we consider a task of such: given an arbitrary audio speech and one lip image of…

Computer Vision and Pattern Recognition · Computer Science 2018-05-23 Lele Chen , Zhiheng Li , Ross K. Maddox , Zhiyao Duan , Chenliang Xu