English
Related papers

Related papers: Data standardization for robust lip sync

200 papers

In this paper, we present a video-based learning framework for animating personalized 3D talking faces from audio. We introduce two training-time data normalizations that significantly improve data sample efficiency. First, we isolate and…

Computer Vision and Pattern Recognition · Computer Science 2021-06-09 Avisek Lahiri , Vivek Kwatra , Christian Frueh , John Lewis , Chris Bregler

The task of lip synchronization (lip-sync) seeks to match the lips of human faces with different audio. It has various applications in the film industry as well as for creating virtual avatars and for video conferencing. This is a…

Computer Vision and Pattern Recognition · Computer Science 2023-08-21 Soumik Mukhopadhyay , Saksham Suri , Ravi Teja Gadde , Abhinav Shrivastava

Audio-driven lip sync has recently drawn significant attention due to its widespread application in the multimedia domain. Individuals exhibit distinct lip shapes when speaking the same utterance, attributed to the unique speaking styles of…

Computer Vision and Pattern Recognition · Computer Science 2025-06-19 Weizhi Zhong , Jichang Li , Yinqi Cai , Ming Li , Feng Gao , Liang Lin , Guanbin Li

Talking face generation aims to create realistic videos with accurate lip synchronization and high visual quality, using given audio and reference video while preserving identity and visual characteristics. In this paper, we start by…

Computer Vision and Pattern Recognition · Computer Science 2024-07-19 Dogucan Yaman , Fevziye Irem Eyiokur , Leonard Bärmann , Hazim Kemal Ekenel , Alexander Waibel

Lip synchronization, known as the task of aligning lip movements in an existing video with new input audio, is typically framed as a simpler variant of audio-driven facial animation. However, as well as suffering from the usual issues in…

Computer Vision and Pattern Recognition · Computer Science 2025-05-02 Antoni Bigata , Rodrigo Mira , Stella Bounareli , Michał Stypułkowski , Konstantinos Vougioukas , Stavros Petridis , Maja Pantic

Lip synchronization aims to generate realistic talking videos that match given audio, which is essential for high-quality video dubbing. However, current methods have fundamental drawbacks: mask-based approaches suffer from local color…

Computer Vision and Pattern Recognition · Computer Science 2026-03-05 Ruidi Fan , Yang Zhou , Siyuan Wang , Tian Yu , Yutong Jiang , Xusheng Liu

We aim to edit the lip movements in talking video according to the given speech while preserving the personal identity and visual details. The task can be decomposed into two sub-problems: (1) speech-driven lip motion generation and (2)…

Computer Vision and Pattern Recognition · Computer Science 2024-06-18 Runyi Yu , Tianyu He , Ailing Zhang , Yuchi Wang , Junliang Guo , Xu Tan , Chang Liu , Jie Chen , Jiang Bian

Lip sync has emerged as a promising technique for generating mouth movements from audio signals. However, synthesizing a high-resolution and photorealistic virtual news anchor is still challenging. Lack of natural appearance, visual…

Computer Vision and Pattern Recognition · Computer Science 2021-05-06 Ruobing Zheng , Zhou Zhu , Bo Song , Changjiang Ji

The challenge of talking face generation from speech lies in aligning two different modal information, audio and video, such that the mouth region corresponds to input audio. Previous methods either exploit audio-visual representation…

Computer Vision and Pattern Recognition · Computer Science 2022-11-04 Se Jin Park , Minsu Kim , Joanna Hong , Jeongsoo Choi , Yong Man Ro

Lip reading, the process of interpreting silent speech from visual lip movements, has gained rising attention for its wide range of realistic applications. Deep learning approaches greatly improve current lip reading systems. However, lip…

Artificial Intelligence · Computer Science 2024-05-03 Linzhi Wu , Xingyu Zhang , Yakun Zhang , Changyan Zheng , Tiejun Liu , Liang Xie , Ye Yan , Erwei Yin

Lip synchronization is the task of aligning a speaker's lip movements in video with corresponding speech audio, and it is essential for creating realistic, expressive video content. However, existing methods often rely on reference frames…

Computer Vision and Pattern Recognition · Computer Science 2025-09-19 Ziqiao Peng , Jiwen Liu , Haoxian Zhang , Xiaoqiang Liu , Songlin Tang , Pengfei Wan , Di Zhang , Hongyan Liu , Jun He

In recent years, DeepFake technology has achieved unprecedented success in high-quality video synthesis, but these methods also pose potential and severe security threats to humanity. DeepFake can be bifurcated into entertainment…

Computer Vision and Pattern Recognition · Computer Science 2024-10-29 Weifeng Liu , Tianyi She , Jiawei Liu , Boheng Li , Dongyu Yao , Ziyou Liang , Run Wang

Researchers have shown a growing interest in Audio-driven Talking Head Generation. The primary challenge in talking head generation is achieving audio-visual coherence between the lips and the audio, known as lip synchronization. This paper…

Sound · Computer Science 2026-02-03 Zhipeng Chen , Xinheng Wang , Lun Xie , Haijie Yuan , Hang Pan

In this work, we investigate the problem of lip-syncing a talking face video of an arbitrary identity to match a target speech segment. Current works excel at producing accurate lip movements on a static image or videos of specific people…

Computer Vision and Pattern Recognition · Computer Science 2020-08-25 K R Prajwal , Rudrabha Mukhopadhyay , Vinay Namboodiri , C V Jawahar

The task of few-shot visual dubbing focuses on synchronizing the lip movements with arbitrary speech input for any talking head video. Albeit moderate improvements in current approaches, they commonly require high-quality homologous data…

Computer Vision and Pattern Recognition · Computer Science 2022-01-19 Tianyi Xie , Liucheng Liao , Cheng Bi , Benlai Tang , Xiang Yin , Jianfei Yang , Mingjie Wang , Jiali Yao , Yang Zhang , Zejun Ma

Lip-to-speech synthesis aims to generate speech audio directly from silent facial video by reconstructing linguistic content from lip movements, providing valuable applications in situations where audio signals are unavailable or degraded.…

Sound · Computer Science 2026-02-03 Jaejun Lee , Yoori Oh , Kyogu Lee

Most lip-to-speech (LTS) synthesis models are trained and evaluated under the assumption that the audio-video pairs in the dataset are perfectly synchronized. In this work, we show that the commonly used audio-visual datasets, such as GRID,…

Sound · Computer Science 2023-03-02 Zhe Niu , Brian Mak

Talking head synthesis, also known as speech-to-lip synthesis, reconstructs the facial motions that align with the given audio tracks. The synthesized videos are evaluated on mainly two aspects, lip-speech synchronization and image…

Machine Learning · Computer Science 2025-03-18 Xulin Fan , Heting Gao , Ziyi Chen , Peng Chang , Mei Han , Mark Hasegawa-Johnson

Generating consecutive images of lip movements that align with a given speech in audio-driven lip synthesis is a challenging task. While previous studies have made strides in synchronization and visual quality, lip intelligibility and video…

Computer Vision and Pattern Recognition · Computer Science 2025-04-08 Shiyan Liu , Rui Qu , Yan Jin

Cross-modality generation is an emerging topic that aims to synthesize data in one modality based on information in a different modality. In this paper, we consider a task of such: given an arbitrary audio speech and one lip image of…

Computer Vision and Pattern Recognition · Computer Science 2018-05-23 Lele Chen , Zhiheng Li , Ross K. Maddox , Zhiyao Duan , Chenliang Xu
‹ Prev 1 2 3 10 Next ›