Related papers: SelfVC: Voice Conversion With Iterative Refinement…

SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling

We present a self-supervised speech restoration method without paired speech corpora. Because the previous general speech restoration method uses artificial paired data created by applying various distortions to high-quality speech corpora,…

Sound · Computer Science 2022-06-29 Takaaki Saeki , Shinnosuke Takamichi , Tomohiko Nakamura , Naoko Tanji , Hiroshi Saruwatari

Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion

Expressive voice conversion performs identity conversion for emotional speakers by jointly converting speaker identity and emotional style. Due to the hierarchical structure of speech emotion, it is challenging to disentangle the emotional…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-22 Zongyang Du , Berrak Sisman , Kun Zhou , Haizhou Li

PseudoVC: Improving One-shot Voice Conversion with Pseudo Paired Data

As parallel training data is scarce for one-shot voice conversion (VC) tasks, waveform reconstruction is typically performed by various VC systems. A typical one-shot VC system comprises a content encoder and a speaker encoder. However, two…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-03 Songjun Cao , Qinghua Wu , Jie Chen , Jin Li , Long Ma

Pathological voice adaptation with autoencoder-based voice conversion

In this paper, we propose a new approach to pathological speech synthesis. Instead of using healthy speech as a source, we customise an existing pathological speech sample to a new speaker's voice characteristics. This approach alleviates…

Sound · Computer Science 2021-06-17 Marc Illa , Bence Mark Halpern , Rob van Son , Laureano Moro-Velazquez , Odette Scharenborg

Leveraging Diverse Semantic-based Audio Pretrained Models for Singing Voice Conversion

Singing Voice Conversion (SVC) is a technique that enables any singer to perform any song. To achieve this, it is essential to obtain speaker-agnostic representations from the source audio, which poses a significant challenge. A common…

Sound · Computer Science 2024-09-17 Xueyao Zhang , Zihao Fang , Yicheng Gu , Haopeng Chen , Lexiao Zou , Junan Zhang , Liumeng Xue , Zhizheng Wu

Transfer Learning from Speech Synthesis to Voice Conversion with Non-Parallel Training Data

This paper presents a novel framework to build a voice conversion (VC) system by learning from a text-to-speech (TTS) synthesis system, that is called TTS-VC transfer learning. We first develop a multi-speaker speech synthesis system with…

Audio and Speech Processing · Electrical Eng. & Systems 2021-01-07 Mingyang Zhang , Yi Zhou , Li Zhao , Haizhou Li

Pureformer-VC: Non-parallel Voice Conversion with Pure Stylized Transformer Blocks and Triplet Discriminative Training

As a foundational technology for intelligent human-computer interaction, voice conversion (VC) seeks to transform speech from any source timbre into any target timbre. Traditional voice conversion methods based on Generative Adversarial…

Sound · Computer Science 2025-06-11 Wenhan Yao , Fen Xiao , Xiarun Chen , Jia Liu , YongQiang He , Weiping Wen

Provable Speech Attributes Conversion via Latent Independence

While signal conversion and disentangled representation learning have shown promise for manipulating data attributes across domains such as audio, image, and multimodal generation, existing approaches, especially for speech style…

Sound · Computer Science 2025-10-10 Jonathan Svirsky , Ofir Lindenbaum , Uri Shaham

RefXVC: Cross-Lingual Voice Conversion with Enhanced Reference Leveraging

This paper proposes RefXVC, a method for cross-lingual voice conversion (XVC) that leverages reference information to improve conversion performance. Previous XVC works generally take an average speaker embedding to condition the speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-25 Mingyang Zhang , Yi Zhou , Yi Ren , Chen Zhang , Xiang Yin , Haizhou Li

HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts

We introduce HybridVC, a voice conversion (VC) framework built upon a pre-trained conditional variational autoencoder (CVAE) that combines the strengths of a latent model with contrastive learning. HybridVC supports text and audio prompts,…

Sound · Computer Science 2024-09-26 Xinlei Niu , Jing Zhang , Charles Patrick Martin

Zero-shot Singing Technique Conversion

In this paper we propose modifications to the neural network framework, AutoVC for the task of singing technique conversion. This includes utilising a pretrained singing technique encoder which extracts technique information, upon which a…

Sound · Computer Science 2021-11-18 Brendan O'Connor , Simon Dixon , George Fazekas

Improving Zero-shot Voice Style Transfer via Disentangled Representation Learning

Voice style transfer, also called voice conversion, seeks to modify one speaker's voice to generate speech as if it came from another (target) speaker. Previous works have made progress on voice conversion with parallel training data and…

Audio and Speech Processing · Electrical Eng. & Systems 2021-03-18 Siyang Yuan , Pengyu Cheng , Ruiyi Zhang , Weituo Hao , Zhe Gan , Lawrence Carin

Voice conversion with limited data and limitless data augmentations

Applying changes to an input speech signal to change the perceived speaker of speech to a target while maintaining the content of the input is a challenging but interesting task known as Voice conversion (VC). Over the last few years, this…

Sound · Computer Science 2022-12-29 Olga Slizovskaia , Jordi Janer , Pritish Chandna , Oscar Mayor

SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion

Diffusion-based singing voice conversion (SVC) models have shown better synthesis quality compared to traditional methods. However, in cross-domain SVC scenarios, where there is a significant disparity in pitch between the source and target…

Sound · Computer Science 2024-06-12 Bingsong Bai , Fengping Wang , Yingming Gao , Ya Li

Pureformer-VC: Non-parallel One-Shot Voice Conversion with Pure Transformer Blocks and Triplet Discriminative Training

One-shot voice conversion(VC) aims to change the timbre of any source speech to match that of the target speaker with only one speech sample. Existing style transfer-based VC methods relied on speech representation disentanglement and…

Sound · Computer Science 2024-11-26 Wenhan Yao , Zedong Xing , Xiarun Chen , Jia Liu , Yongqiang He , Weiping Wen

A Pre-training Framework that Encodes Noise Information for Speech Quality Assessment

Self-supervised learning (SSL) has grown in interest within the speech processing community, since it produces representations that are useful for many downstream tasks. SSL uses global and contextual methods to produce robust…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-08 Subrina Sultana , Donald S. Williamson

Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion

One-shot voice conversion (VC) with only a single target speaker's speech for reference has become a hot research topic. Existing works generally disentangle timbre, while information about pitch, rhythm and content is still mixed together.…

Audio and Speech Processing · Electrical Eng. & Systems 2022-08-24 SiCheng Yang , Methawee Tantrawenith , Haolin Zhuang , Zhiyong Wu , Aolan Sun , Jianzong Wang , Ning Cheng , Huaizhen Tang , Xintao Zhao , Jie Wang , Helen Meng

VQVC+: One-Shot Voice Conversion by Vector Quantization and U-Net architecture

Voice conversion (VC) is a task that transforms the source speaker's timbre, accent, and tones in audio into another one's while preserving the linguistic content. It is still a challenging work, especially in a one-shot setting.…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-09 Da-Yi Wu , Yen-Hao Chen , Hung-Yi Lee

Measuring the Effectiveness of Voice Conversion on Speaker Identification and Automatic Speech Recognition Systems

This paper evaluates the effectiveness of a Cycle-GAN based voice converter (VC) on four speaker identification (SID) systems and an automated speech recognition (ASR) system for various purposes. Audio samples converted by the VC model are…

Audio and Speech Processing · Electrical Eng. & Systems 2019-05-30 Gokce Keskin , Tyler Lee , Cory Stephenson , Oguz H. Elibol

Everyone-Can-Sing: Zero-Shot Singing Voice Synthesis and Conversion with Speech Reference

We propose a unified framework for Singing Voice Synthesis (SVS) and Conversion (SVC), addressing the limitations of existing approaches in cross-domain SVS/SVC, poor output musicality, and scarcity of singing data. Our framework enables…

Sound · Computer Science 2025-01-24 Shuqi Dai , Yunyun Wang , Roger B. Dannenberg , Zeyu Jin