Related papers: Voice Conversion from Non-parallel Corpora Using V…

Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational Autoencoders

We propose a flexible framework that deals with both singer conversion and singers vocal technique conversion. The proposed model is trained on non-parallel corpora, accommodates many-to-many conversion, and leverages recent advances of…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-26 Yin-Jyun Luo , Chin-Chen Hsu , Kat Agres , Dorien Herremans

Voice Conversion Based on Cross-Domain Features Using Variational Auto Encoders

An effective approach to non-parallel voice conversion (VC) is to utilize deep neural networks (DNNs), specifically variational auto encoders (VAEs), to model the latent structure of speech in an unsupervised manner. A previous study has…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-09 Wen-Chin Huang , Hsin-Te Hwang , Yu-Huai Peng , Yu Tsao , Hsin-Min Wang

Voice Conversion from Unaligned Corpora using Variational Autoencoding Wasserstein Generative Adversarial Networks

Building a voice conversion (VC) system from non-parallel speech corpora is challenging but highly valuable in real application scenarios. In most situations, the source and the target speakers do not repeat the same texts or they may even…

Computation and Language · Computer Science 2017-06-09 Chin-Cheng Hsu , Hsin-Te Hwang , Yi-Chiao Wu , Yu Tsao , Hsin-Min Wang

Non-Parallel Voice Conversion with Cyclic Variational Autoencoder

In this paper, we present a novel technique for a non-parallel voice conversion (VC) with the use of cyclic variational autoencoder (CycleVAE)-based spectral modeling. In a variational autoencoder(VAE) framework, a latent space, usually…

Audio and Speech Processing · Electrical Eng. & Systems 2019-07-25 Patrick Lumban Tobing , Yi-Chiao Wu , Tomoki Hayashi , Kazuhiro Kobayashi , Tomoki Toda

Blind Training for Channel-Adaptive Digital Semantic Communications

Semantic encoders and decoders for digital semantic communication (SC) often struggle to adapt to variations in unpredictable channel environments and diverse system designs. To address these challenges, this paper proposes a novel…

Signal Processing · Electrical Eng. & Systems 2025-03-20 Yongjeong Oh , Joohyuk Park , Jinho Choi , Jihong Park , Yo-Seb Jeon

Pureformer-VC: Non-parallel Voice Conversion with Pure Stylized Transformer Blocks and Triplet Discriminative Training

As a foundational technology for intelligent human-computer interaction, voice conversion (VC) seeks to transform speech from any source timbre into any target timbre. Traditional voice conversion methods based on Generative Adversarial…

Sound · Computer Science 2025-06-11 Wenhan Yao , Fen Xiao , Xiarun Chen , Jia Liu , YongQiang He , Weiping Wen

Learning in your voice: Non-parallel voice conversion based on speaker consistency loss

In this paper, we propose a novel voice conversion strategy to resolve the mismatch between the training and conversion scenarios when parallel speech corpus is unavailable for training. Based on auto-encoder and disentanglement frameworks,…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-05 Yoohwan Kwon , Soo-Whan Chung , Hee-Soo Heo , Hong-Goo Kang

Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion

Nowadays, recognition-synthesis-based methods have been quite popular with voice conversion (VC). By introducing linguistics features with good disentangling characters extracted from an automatic speech recognition (ASR) model, the VC…

Sound · Computer Science 2023-05-17 Xintao Zhao , Shuai Wang , Yang Chao , Zhiyong Wu , Helen Meng

Adversarially Trained Autoencoders for Parallel-Data-Free Voice Conversion

We present a method for converting the voices between a set of speakers. Our method is based on training multiple autoencoder paths, where there is a single speaker-independent encoder and multiple speaker-dependent decoders. The…

Audio and Speech Processing · Electrical Eng. & Systems 2019-05-13 Orhan Ocal , Oguz H. Elibol , Gokce Keskin , Cory Stephenson , Anil Thomas , Kannan Ramchandran

Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations

Recently, cycle-consistent adversarial network (Cycle-GAN) has been successfully applied to voice conversion to a different speaker without parallel data, although in those approaches an individual model is needed for each target speaker.…

Audio and Speech Processing · Electrical Eng. & Systems 2018-06-26 Ju-chieh Chou , Cheng-chieh Yeh , Hung-yi Lee , Lin-shan Lee

Pureformer-VC: Non-parallel One-Shot Voice Conversion with Pure Transformer Blocks and Triplet Discriminative Training

One-shot voice conversion(VC) aims to change the timbre of any source speech to match that of the target speaker with only one speech sample. Existing style transfer-based VC methods relied on speech representation disentanglement and…

Sound · Computer Science 2024-11-26 Wenhan Yao , Zedong Xing , Xiarun Chen , Jia Liu , Yongqiang He , Weiping Wen

VAW-GAN for Singing Voice Conversion with Non-parallel Training Data

Singing voice conversion aims to convert singer's voice from source to target without changing singing content. Parallel training data is typically required for the training of singing voice conversion system, that is however not practical…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-04 Junchen Lu , Kun Zhou , Berrak Sisman , Haizhou Li

Generalization of Spectrum Differential based Direct Waveform Modification for Voice Conversion

We present a modification to the spectrum differential based direct waveform modification for voice conversion (DIFFVC) so that it can be directly applied as a waveform generation module to voice conversion models. The recently proposed…

Audio and Speech Processing · Electrical Eng. & Systems 2019-07-30 Wen-Chin Huang , Yi-Chiao Wu , Kazuhiro Kobayashi , Yu-Huai Peng , Hsin-Te Hwang , Patrick Lumban Tobing , Yu Tsao , Hsin-Min Wang , Tomoki Toda

Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces

Recent research has shown that word embedding spaces learned from text corpora of different languages can be aligned without any parallel data supervision. Inspired by the success in unsupervised cross-lingual word embeddings, in this paper…

Computation and Language · Computer Science 2018-09-24 Yu-An Chung , Wei-Hung Weng , Schrasing Tong , James Glass

crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder

In this paper, we present an open-source software for developing a nonparallel voice conversion (VC) system named crank. Although we have released an open-source VC software based on the Gaussian mixture model named sprocket in the last VC…

Audio and Speech Processing · Electrical Eng. & Systems 2021-03-05 Kazuhiro Kobayashi , Wen-Chin Huang , Yi-Chiao Wu , Patrick Lumban Tobing , Tomoki Hayashi , Tomoki Toda

Non-Parallel Sequence-to-Sequence Voice Conversion with Disentangled Linguistic and Speaker Representations

This paper presents a method of sequence-to-sequence (seq2seq) voice conversion using non-parallel training data. In this method, disentangled linguistic and speaker representations are extracted from acoustic features, and voice conversion…

Audio and Speech Processing · Electrical Eng. & Systems 2020-01-14 Jing-Xuan Zhang , Zhen-Hua Ling , Li-Rong Dai

Investigation of Using Disentangled and Interpretable Representations for One-shot Cross-lingual Voice Conversion

We study the problem of cross-lingual voice conversion in non-parallel speech corpora and one-shot learning setting. Most prior work require either parallel speech corpora or enough amount of training data from a target speaker. However, we…

Sound · Computer Science 2018-08-17 Seyed Hamidreza Mohammadi , Taehwan Kim

FastVC: Fast Voice Conversion with non-parallel data

This paper introduces FastVC, an end-to-end model for fast Voice Conversion (VC). The proposed model can convert speech of arbitrary length from multiple source speakers to multiple target speakers. FastVC is based on a conditional…

Audio and Speech Processing · Electrical Eng. & Systems 2021-05-07 Oriol Barbany Mayor , Milos Cernak

Subband-based Generative Adversarial Network for Non-parallel Many-to-many Voice Conversion

Voice conversion is to generate a new speech with the source content and a target voice style. In this paper, we focus on one general setting, i.e., non-parallel many-to-many voice conversion, which is close to the real-world scenario. As…

Sound · Computer Science 2022-07-28 Jian Ma , Zhedong Zheng , Hao Fei , Feng Zheng , Tat-seng Chua , Yi Yang

Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities

Voice conversion (VC) using sequence-to-sequence learning of context posterior probabilities is proposed. Conventional VC using shared context posterior probabilities predicts target speech parameters from the context posterior…

Sound · Computer Science 2017-08-08 Hiroyuki Miyoshi , Yuki Saito , Shinnosuke Takamichi , Hiroshi Saruwatari