English
Related papers

Related papers: SelfVC: Voice Conversion With Iterative Refinement…

200 papers

We present a self-supervised speech restoration method without paired speech corpora. Because the previous general speech restoration method uses artificial paired data created by applying various distortions to high-quality speech corpora,…

Expressive voice conversion performs identity conversion for emotional speakers by jointly converting speaker identity and emotional style. Due to the hierarchical structure of speech emotion, it is challenging to disentangle the emotional…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-22 Zongyang Du , Berrak Sisman , Kun Zhou , Haizhou Li

As parallel training data is scarce for one-shot voice conversion (VC) tasks, waveform reconstruction is typically performed by various VC systems. A typical one-shot VC system comprises a content encoder and a speaker encoder. However, two…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-03 Songjun Cao , Qinghua Wu , Jie Chen , Jin Li , Long Ma

In this paper, we propose a new approach to pathological speech synthesis. Instead of using healthy speech as a source, we customise an existing pathological speech sample to a new speaker's voice characteristics. This approach alleviates…

Singing Voice Conversion (SVC) is a technique that enables any singer to perform any song. To achieve this, it is essential to obtain speaker-agnostic representations from the source audio, which poses a significant challenge. A common…

Sound · Computer Science 2024-09-17 Xueyao Zhang , Zihao Fang , Yicheng Gu , Haopeng Chen , Lexiao Zou , Junan Zhang , Liumeng Xue , Zhizheng Wu

This paper presents a novel framework to build a voice conversion (VC) system by learning from a text-to-speech (TTS) synthesis system, that is called TTS-VC transfer learning. We first develop a multi-speaker speech synthesis system with…

Audio and Speech Processing · Electrical Eng. & Systems 2021-01-07 Mingyang Zhang , Yi Zhou , Li Zhao , Haizhou Li

As a foundational technology for intelligent human-computer interaction, voice conversion (VC) seeks to transform speech from any source timbre into any target timbre. Traditional voice conversion methods based on Generative Adversarial…

Sound · Computer Science 2025-06-11 Wenhan Yao , Fen Xiao , Xiarun Chen , Jia Liu , YongQiang He , Weiping Wen

While signal conversion and disentangled representation learning have shown promise for manipulating data attributes across domains such as audio, image, and multimodal generation, existing approaches, especially for speech style…

Sound · Computer Science 2025-10-10 Jonathan Svirsky , Ofir Lindenbaum , Uri Shaham

This paper proposes RefXVC, a method for cross-lingual voice conversion (XVC) that leverages reference information to improve conversion performance. Previous XVC works generally take an average speaker embedding to condition the speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-25 Mingyang Zhang , Yi Zhou , Yi Ren , Chen Zhang , Xiang Yin , Haizhou Li

We introduce HybridVC, a voice conversion (VC) framework built upon a pre-trained conditional variational autoencoder (CVAE) that combines the strengths of a latent model with contrastive learning. HybridVC supports text and audio prompts,…

Sound · Computer Science 2024-09-26 Xinlei Niu , Jing Zhang , Charles Patrick Martin

In this paper we propose modifications to the neural network framework, AutoVC for the task of singing technique conversion. This includes utilising a pretrained singing technique encoder which extracts technique information, upon which a…

Sound · Computer Science 2021-11-18 Brendan O'Connor , Simon Dixon , George Fazekas

Voice style transfer, also called voice conversion, seeks to modify one speaker's voice to generate speech as if it came from another (target) speaker. Previous works have made progress on voice conversion with parallel training data and…

Audio and Speech Processing · Electrical Eng. & Systems 2021-03-18 Siyang Yuan , Pengyu Cheng , Ruiyi Zhang , Weituo Hao , Zhe Gan , Lawrence Carin

Applying changes to an input speech signal to change the perceived speaker of speech to a target while maintaining the content of the input is a challenging but interesting task known as Voice conversion (VC). Over the last few years, this…

Sound · Computer Science 2022-12-29 Olga Slizovskaia , Jordi Janer , Pritish Chandna , Oscar Mayor

Diffusion-based singing voice conversion (SVC) models have shown better synthesis quality compared to traditional methods. However, in cross-domain SVC scenarios, where there is a significant disparity in pitch between the source and target…

Sound · Computer Science 2024-06-12 Bingsong Bai , Fengping Wang , Yingming Gao , Ya Li

One-shot voice conversion(VC) aims to change the timbre of any source speech to match that of the target speaker with only one speech sample. Existing style transfer-based VC methods relied on speech representation disentanglement and…

Sound · Computer Science 2024-11-26 Wenhan Yao , Zedong Xing , Xiarun Chen , Jia Liu , Yongqiang He , Weiping Wen

Self-supervised learning (SSL) has grown in interest within the speech processing community, since it produces representations that are useful for many downstream tasks. SSL uses global and contextual methods to produce robust…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-08 Subrina Sultana , Donald S. Williamson

One-shot voice conversion (VC) with only a single target speaker's speech for reference has become a hot research topic. Existing works generally disentangle timbre, while information about pitch, rhythm and content is still mixed together.…

Audio and Speech Processing · Electrical Eng. & Systems 2022-08-24 SiCheng Yang , Methawee Tantrawenith , Haolin Zhuang , Zhiyong Wu , Aolan Sun , Jianzong Wang , Ning Cheng , Huaizhen Tang , Xintao Zhao , Jie Wang , Helen Meng

Voice conversion (VC) is a task that transforms the source speaker's timbre, accent, and tones in audio into another one's while preserving the linguistic content. It is still a challenging work, especially in a one-shot setting.…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-09 Da-Yi Wu , Yen-Hao Chen , Hung-Yi Lee

This paper evaluates the effectiveness of a Cycle-GAN based voice converter (VC) on four speaker identification (SID) systems and an automated speech recognition (ASR) system for various purposes. Audio samples converted by the VC model are…

Audio and Speech Processing · Electrical Eng. & Systems 2019-05-30 Gokce Keskin , Tyler Lee , Cory Stephenson , Oguz H. Elibol

We propose a unified framework for Singing Voice Synthesis (SVS) and Conversion (SVC), addressing the limitations of existing approaches in cross-domain SVS/SVC, poor output musicality, and scarcity of singing data. Our framework enables…

Sound · Computer Science 2025-01-24 Shuqi Dai , Yunyun Wang , Roger B. Dannenberg , Zeyu Jin