English
Related papers

Related papers: Puppet Dubbing

200 papers

Many speech segments in movies are re-recorded in a studio during postproduction, to compensate for poor sound quality as recorded on location. Manual alignment of the newly-recorded speech with the original lip movements is a tedious task.…

Computer Vision and Pattern Recognition · Computer Science 2018-08-21 Tavi Halperin , Ariel Ephrat , Shmuel Peleg

Dubbing is a type of audiovisual translation where dialogues are translated and enacted so that they give the impression that the media is in the target language. It requires a careful alignment of dubbed recordings with the lip movements…

Computation and Language · Computer Science 2019-08-21 Alp Öktem , Mireia Farrús , Antonio Bonafonte

Movie dubbing seeks to synthesize speech from a given script using a specific voice, while ensuring accurate lip synchronization and emotion-prosody alignment with the character's visual performance. However, existing alignment approaches…

Sound · Computer Science 2025-12-22 Zhedong Zhang , Liang Li , Gaoxiang Cong , Chunshan Liu , Yuhan Gao , Xiaowan Wang , Tao Gu , Yuankai Qi

The goal of automatic dubbing is to perform speech-to-speech translation while achieving audiovisual coherence. This entails isochrony, i.e., translating the original speech by also matching its prosodic structure into phrases and pauses,…

Computation and Language · Computer Science 2022-04-07 Yogesh Virkar , Marcello Federico , Robert Enyedi , Roberto Barra-Chicote

Dubbing is a post-production process of re-recording actors' dialogues, which is extensively used in filmmaking and video production. It is usually performed manually by professional voice actors who read lines with proper prosody, and in…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-16 Chenxu Hu , Qiao Tian , Tingle Li , Yuping Wang , Yuxuan Wang , Hang Zhao

Video dubbing aims to synthesize realistic, lip-synced videos from a reference video and a driving audio signal. Although existing methods can accurately generate mouth shapes driven by audio, they often fail to preserve identity-specific…

Computer Vision and Pattern Recognition · Computer Science 2025-01-10 Runzhen Liu , Qinjie Lin , Yunfei Liu , Lijian Lin , Ye Zhu , Yu Li , Chuhua Xian , Fa-Ting Hong

Visual dubbing is the process of generating lip motions of an actor in a video to synchronise with given audio. Recent advances have made progress towards this goal but have not been able to produce an approach suitable for mass adoption.…

Computer Vision and Pattern Recognition · Computer Science 2024-01-12 Jack Saunders , Vinay Namboodiri

Audiovisual representation learning typically relies on the correspondence between sight and sound. However, there are often multiple audio tracks that can correspond with a visual scene. Consider, for example, different conversations on…

Sound · Computer Science 2024-06-11 Nikhil Singh , Chih-Wei Wu , Iroro Orife , Mahdi Kalayeh

The task of few-shot visual dubbing focuses on synchronizing the lip movements with arbitrary speech input for any talking head video. Albeit moderate improvements in current approaches, they commonly require high-quality homologous data…

Computer Vision and Pattern Recognition · Computer Science 2022-01-19 Tianyi Xie , Liucheng Liao , Cheng Bi , Benlai Tang , Xiang Yin , Jianfei Yang , Mingjie Wang , Jiali Yao , Yang Zhang , Zejun Ma

Automatic dubbing (AD) is the task of translating the original speech in a video into target language speech. The new target language speech should satisfy isochrony; that is, the new speech should be time aligned with the original video,…

Computation and Language · Computer Science 2023-02-28 Alexandra Chronopoulou , Brian Thompson , Prashant Mathur , Yogesh Virkar , Surafel M. Lakew , Marcello Federico

We present VoiceCraft-Dub, a novel approach for automated video dubbing that synthesizes high-quality speech from text and facial cues. This task has broad applications in filmmaking, multimedia creation, and assisting voice-impaired…

Computer Vision and Pattern Recognition · Computer Science 2025-04-04 Kim Sung-Bin , Jeongsoo Choi , Puyuan Peng , Joon Son Chung , Tae-Hyun Oh , David Harwath

Recently, artificial intelligence-based dubbing technology has advanced, enabling automated dubbing (AD) to convert the source speech of a video into target speech in different languages. However, natural AD still faces synchronization…

Audio and Speech Processing · Electrical Eng. & Systems 2026-05-05 Changi Hong , Yoonah Song , Hwayoung Park , Chaewoon Bang , Dayeon Ku , Do Hyun Lee , Hong Kook Kim

Movie Dubbing aims to convert scripts into speeches that align with the given movie clip in both temporal and emotional aspects while preserving the vocal timbre of a given brief reference audio. Existing methods focus primarily on reducing…

Movie dubbing describes the process of transforming a script into speech that aligns temporally and emotionally with a given movie clip while exemplifying the speaker's voice demonstrated in a short reference audio clip. This task demands…

Sound · Computer Science 2025-03-19 Zhedong Zhang , Liang Li , Chenggang Yan , Chunshan Liu , Anton van den Hengel , Yuankai Qi

Visual dubbing, the synchronization of facial movements with new speech, is crucial for making content accessible across different languages, enabling broader global reach. However, current methods face significant limitations. Existing…

Computer Vision and Pattern Recognition · Computer Science 2025-05-30 Binyamin Manela , Sharon Gannot , Ethan Fetyaya

Full-duplex dialog models aim to listen and speak simultaneously, delivering rapid responses to dynamic user input. Among different solutions to full-duplexity, a native solution merges multiple channels in each time step, achieving the…

Sound · Computer Science 2026-02-02 Yiqun Yao , Xiang Li , Xin Jiang , Xuezhi Fang , Naitong Yu , Wenjia Ma , Aixin Sun , Yequan Wang

We present Neural Voice Puppetry, a novel approach for audio-driven facial video synthesis. Given an audio sequence of a source person or digital assistant, we generate a photo-realistic output video of a target person that is in sync with…

Computer Vision and Pattern Recognition · Computer Science 2020-07-30 Justus Thies , Mohamed Elgharib , Ayush Tewari , Christian Theobalt , Matthias Nießner

Video dubbing aims to translate the original speech in a film or television program into the speech in a target language, which can be achieved with a cascaded system consisting of speech recognition, machine translation and speech…

Computation and Language · Computer Science 2023-12-06 Yihan Wu , Junliang Guo , Xu Tan , Chen Zhang , Bohan Li , Ruihua Song , Lei He , Sheng Zhao , Arul Menezes , Jiang Bian

Dubbing is a technique for translating video content from one language to another. However, state-of-the-art visual dubbing techniques directly copy facial expressions from source to target actors without considering identity-specific…

Computer Vision and Pattern Recognition · Computer Science 2019-09-09 Hyeongwoo Kim , Mohamed Elgharib , Michael Zollhöfer , Hans-Peter Seidel , Thabo Beeler , Christian Richardt , Christian Theobalt

Existing automated dubbing methods are usually designed for Professionally Generated Content (PGC) production, which requires massive training data and training time to learn a person-specific audio-video mapping. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2023-09-04 Linsen Song , Wayne Wu , Chaoyou Fu , Chen Change Loy , Ran He
‹ Prev 1 2 3 10 Next ›