English
Related papers

Related papers: SelfVC: Voice Conversion With Iterative Refinement…

200 papers

Many existing works on voice conversion (VC) tasks use automatic speech recognition (ASR) models for ensuring linguistic consistency between source and converted samples. However, for the low-data resource domains, training a high-quality…

Sound · Computer Science 2023-05-25 Mayank Kumar Singh , Naoya Takahashi , Onoe Naoyuki

Voice conversion (VC) techniques aim to modify speaker identity of an utterance while preserving the underlying linguistic information. Most VC approaches ignore modeling of the speaking style (e.g. emotion and emphasis), which may contain…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-20 Songxiang Liu , Yuewen Cao , Shiyin Kang , Na Hu , Xunying Liu , Dan Su , Dong Yu , Helen Meng

Traditional studies on voice conversion (VC) have made progress with parallel training data and known speakers. Good voice conversion quality is obtained by exploring better alignment modules or expressive mapping functions. In this study,…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-01 Jiachen Lian , Chunlei Zhang , Dong Yu

Nowadays, as more and more systems achieve good performance in traditional voice conversion (VC) tasks, people's attention gradually turns to VC tasks under extreme conditions. In this paper, we propose a novel method for zero-shot voice…

Sound · Computer Science 2023-04-04 Haozhe Zhang , Zexin Cai , Xiaoyi Qin , Ming Li

Non-parallel many-to-many voice conversion remains an interesting but challenging speech processing task. Recently, AutoVC, a conditional autoencoder based method, achieved excellent conversion results by disentangling the speaker identity…

Sound · Computer Science 2022-08-09 Huaizhen Tang , Xulong Zhang , Jianzong Wang , Ning Cheng , Zhen Zeng , Edward Xiao , Jing Xiao

Self-supervised learning in speech involves training a speech representation network on a large-scale unannotated speech corpus, and then applying the learned representations to downstream tasks. Since the majority of the downstream tasks…

A singing voice conversion model converts a song in the voice of an arbitrary source singer to the voice of a target singer. Recently, methods that leverage self-supervised audio representations such as HuBERT and Wav2Vec 2.0 have helped…

Audio and Speech Processing · Electrical Eng. & Systems 2023-03-23 Tejas Jayashankar , Jilong Wu , Leda Sari , David Kant , Vimal Manohar , Qing He

Voice conversion (VC) aims to modify the speaker's timbre while retaining speech content. Previous approaches have tokenized the outputs from self-supervised into semantic tokens, facilitating disentanglement of speech content information.…

Sound · Computer Science 2024-09-11 Zhengyang Chen , Shuai Wang , Mingyang Zhang , Xuechen Liu , Junichi Yamagishi , Yanmin Qian

Recently, voice conversion (VC) without parallel data has been successfully adapted to multi-target scenario in which a single model is trained to convert the input voice to many different speakers. However, such model suffers from the…

Machine Learning · Computer Science 2019-08-23 Ju-chieh Chou , Cheng-chieh Yeh , Hung-yi Lee

Singing voice conversion is to convert the source singing voice into the target singing voice except for the content. Currently, flow-based models can complete the task of voice conversion, but they struggle to effectively extract latent…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-10 Hui Li , Hongyu Wang , Zhijin Chen , Bohan Sun , Bo Li

One-shot voice conversion (VC), which performs conversion across arbitrary speakers with only a single target-speaker utterance for reference, can be effectively achieved by speech representation disentanglement. Existing work generally…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-22 Disong Wang , Liqun Deng , Yu Ting Yeung , Xiao Chen , Xunying Liu , Helen Meng

Speech time reversal refers to the process of reversing the entire speech signal in time, causing it to play backward. Such signals are completely unintelligible since the fundamental structures of phonemes and syllables are destroyed.…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-02 Ishan D. Biyani , Nirmesh J. Shah , Ashishkumar P. Gudmalwar , Pankaj Wasnik , Rajiv R. Shah

Disentangling speaker and content attributes of a speech signal into separate latent representations followed by decoding the content with an exchanged speaker representation is a popular approach for voice conversion, which can be trained…

Audio and Speech Processing · Electrical Eng. & Systems 2022-09-07 Michael Kuhlmann , Fritz Seebauer , Janek Ebbers , Petra Wagner , Reinhold Haeb-Umbach

Here we present a novel approach to conditioning the SampleRNN generative model for voice conversion (VC). Conventional methods for VC modify the perceived speaker identity by converting between source and target acoustic features. Our…

Sound · Computer Science 2018-10-30 Cong Zhou , Michael Horgan , Vivek Kumar , Cristina Vasco , Dan Darcy

Any-to-any singing voice conversion (SVC) is confronted with the challenge of ``timbre leakage'' issue caused by inadequate disentanglement between the content and the speaker timbre. To address this issue, this study introduces NeuCoSVC, a…

Sound · Computer Science 2024-01-09 Binzhu Sha , Xu Li , Zhiyong Wu , Ying Shan , Helen Meng

We propose a speech enhancement system that combines speaker-agnostic speech restoration with voice conversion (VC) to obtain a studio-level quality speech signal. While voice conversion models are typically used to change speaker…

Sound · Computer Science 2025-05-22 Kyungguen Byun , Jason Filos , Erik Visser , Sunkuk Moon

Speaker identity is one of the important characteristics of human speech. In voice conversion, we change the speaker identity from one to another, while keeping the linguistic content unchanged. Voice conversion involves multiple speech…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-18 Berrak Sisman , Junichi Yamagishi , Simon King , Haizhou Li

Voice Conversion (VC) converts the voice of a source speech to that of a target while maintaining the source's content. Speech can be mainly decomposed into four components: content, timbre, rhythm and pitch. Unfortunately, most related…

Sound · Computer Science 2023-06-22 Zhonghua Liu , Shijun Wang , Ning Chen

Speech signals are inherently complex as they encompass both global acoustic characteristics and local semantic information. However, in the task of target speech extraction, certain elements of global and local semantic information in the…

Sound · Computer Science 2024-08-27 Zhaoxi Mu , Xinyu Yang , Sining Sun , Qing Yang

Collecting speech data is an important step in training speech recognition systems and other speech-based machine learning models. However, the issue of privacy protection is an increasing concern that must be addressed. The current study…