English
Related papers

Related papers: Zero-shot Voice Conversion with Diffusion Transfor…

200 papers

Zero-shot voice conversion (VC) aims to convert the original speaker's timbre to any target speaker while keeping the linguistic content. Current mainstream zero-shot voice conversion approaches depend on pre-trained recognition models to…

Sound · Computer Science 2024-12-04 Yuke Li , Xinfa Zhu , Hanzhao Li , JiXun Yao , WenJie Tian , XiPeng Yang , YunLin Chen , Zhifei Li , Lei Xie

Zero-Shot Voice Conversion (VC) aims to transform the source speaker's timbre into an arbitrary unseen one while retaining speech content. Most prior work focuses on preserving the source's prosody, while fine-grained timbre information may…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-03 Jialong Zuo , Shengpeng Ji , Minghui Fang , Mingze Li , Ziyue Jiang , Xize Cheng , Xiaoda Yang , Chen Feiyang , Xinyu Duan , Zhou Zhao

Zero-shot voice conversion (VC) aims to transfer the source speaker timbre to arbitrary unseen target speaker timbre, while keeping the linguistic content unchanged. Although the voice of generated speech can be controlled by providing the…

Sound · Computer Science 2024-01-31 Junjie Li , Yiwei Guo , Xie Chen , Kai Yu

Zero-shot voice conversion aims to transfer the voice of a source speaker to that of a speaker unseen during training, while preserving the content information. Although various methods have been proposed to reconstruct speaker information…

Sound · Computer Science 2024-08-22 Anastasia Avdeeva , Aleksei Gusev

Zero-shot voice conversion (VC) aims to transfer the timbre from the source speaker to an arbitrary unseen speaker while preserving the original linguistic content. Despite recent advancements in zero-shot VC using language model-based or…

Audio and Speech Processing · Electrical Eng. & Systems 2024-12-11 Jixun Yao , Yuguang Yang , Yu Pan , Ziqian Ning , Jiaohao Ye , Hongbin Zhou , Lei Xie

Expressive zero-shot voice conversion (VC) is a critical and challenging task that aims to transform the source timbre into an arbitrary unseen speaker while preserving the original content and expressive qualities. Despite recent progress…

Sound · Computer Science 2025-01-13 Yuguang Yang , Yu Pan , Jixun Yao , Xiang Zhang , Jianhao Ye , Hongbin Zhou , Lei Xie , Lei Ma , Jianjun Zhao

Zero-shot voice conversion (VC) synthesizes speech in a target speaker's voice while preserving linguistic and paralinguistic content. However, timbre leakage-where source speaker traits persist-remains a challenge, especially in neural…

Audio and Speech Processing · Electrical Eng. & Systems 2025-07-15 Shivam Mehta , Yingru Liu , Zhenyu Tang , Kainan Peng , Vimal Manohar , Shun Zhang , Mike Seltzer , Qing He , Mingbo Ma

Despite recent advances in zero-shot voice conversion (VC), achieving speaker similarity and naturalness comparable to ground-truth recordings remains a significant challenge. In this letter, we propose CTEFM-VC, a zero-shot VC framework…

Sound · Computer Science 2025-08-12 Yu Pan , Yuguang Yang , Jixun Yao , Lei Ma , Jianjun Zhao

This study presents an innovative Zero-Shot any-to-any Singing Voice Conversion (SVC) method, leveraging a novel clustering-based phoneme representation to effectively separate content, timbre, and singing style. This approach enables…

Sound · Computer Science 2024-10-15 Wangjin Zhou , Fengrun Zhang , Yiming Liu , Wenhao Guan , Yi Zhao , Tatsuya Kawahara

Zero-shot voice conversion (VC) aims to transfer timbre from a source speaker to any unseen target speaker while preserving linguistic content. Growing application scenarios demand models with streaming inference capabilities. This has…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-23 Guobin Ma , Jixun Yao , Ziqian Ning , Yuepeng Jiang , Lingxin Xiong , Lei Xie , Pengcheng Zhu

Singing voice conversion (SVC) aims to render the target singer's timbre while preserving melody and lyrics. However, existing zero-shot SVC systems remain fragile in real songs due to harmony interference, F0 errors, and the lack of…

Zero-shot singing voice conversion (SVC) transforms a source singer's timbre to an unseen target speaker's voice while preserving melodic content without fine-tuning. Existing methods model speaker timbre and vocal content separately,…

Sound · Computer Science 2025-11-18 Bingsong Bai , Yizhong Geng , Fengping Wang , Cong Wang , Puyuan Guo , Yingming Gao , Ya Li

In real-world voice conversion applications, environmental noise in source speech and user demands for expressive output pose critical challenges. Traditional ASR-based methods ensure noise robustness but suppress prosody richness, while…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-11 Yuepeng Jiang , Ziqian Ning , Shuai Wang , Chengjia Wang , Mengxiao Bi , Pengcheng Zhu , Zhonghua Fu , Lei Xie

Zero-shot voice conversion (VC) aims to convert a source utterance into the voice of an unseen target speaker while preserving its linguistic content. Although recent systems have improved conversion quality, building zero-shot VC systems…

Audio and Speech Processing · Electrical Eng. & Systems 2026-04-23 Qixi Zheng , Yuxiang Zhao , Tianrui Wang , Wenxi Chen , Kele Xu , Yikang Li , Qinyuan Chen , Xipeng Qiu , Kai Yu , Xie Chen

Voice Conversion research in recent times has increasingly focused on improving the zero-shot capabilities of existing methods. Despite remarkable advancements, current architectures still tend to struggle in zero-shot cross-lingual…

Sound · Computer Science 2025-05-26 Advait Joglekar , Divyanshu Singh , Rooshil Rohit Bhatia , S. Umesh

Nowadays, as more and more systems achieve good performance in traditional voice conversion (VC) tasks, people's attention gradually turns to VC tasks under extreme conditions. In this paper, we propose a novel method for zero-shot voice…

Sound · Computer Science 2023-04-04 Haozhe Zhang , Zexin Cai , Xiaoyi Qin , Ming Li

The goal of voice conversion is to transform the speech of a source speaker to sound like that of a reference speaker while preserving the original content. A key challenge is to extract disentangled linguistic content from the source and…

Sound · Computer Science 2025-01-15 Jaehun Kim , Ji-Hoon Kim , Yeunju Choi , Tan Dat Nguyen , Seongkyu Mun , Joon Son Chung

Traditional studies on voice conversion (VC) have made progress with parallel training data and known speakers. Good voice conversion quality is obtained by exploring better alignment modules or expressive mapping functions. In this study,…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-01 Jiachen Lian , Chunlei Zhang , Dong Yu

Style voice conversion aims to transform the speaking style of source speech into a desired style while keeping the original speaker's identity. However, previous style voice conversion approaches primarily focus on well-defined domains…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-09 Xinfa Zhu , Lei He , Yujia Xiao , Xi Wang , Xu Tan , Sheng Zhao , Lei Xie

Although voice conversion (VC) systems have shown a remarkable ability to transfer voice style, existing methods still have an inaccurate pitch and low speaker adaptation quality. To address these challenges, we introduce Diff-HierVC, a…

Audio and Speech Processing · Electrical Eng. & Systems 2023-11-09 Ha-Yeong Choi , Sang-Hoon Lee , Seong-Whan Lee
‹ Prev 1 2 3 10 Next ›