English
Related papers

Related papers: SelfVC: Voice Conversion With Iterative Refinement…

200 papers

Voice conversion (VC) systems are widely used for several applications, from speaker anonymisation to personalised speech synthesis. Supervised approaches learn a mapping between different speakers using parallel data, which is expensive to…

Unsupervised Zero-Shot Voice Conversion (VC) aims to modify the speaker characteristic of an utterance to match an unseen target speaker without relying on parallel training data. Recently, self-supervised learning of speech representation…

Sound · Computer Science 2022-02-14 Trung Dang , Dung Tran , Peter Chin , Kazuhito Koishida

The goal of voice conversion is to transform the speech of a source speaker to sound like that of a reference speaker while preserving the original content. A key challenge is to extract disentangled linguistic content from the source and…

Sound · Computer Science 2025-01-15 Jaehun Kim , Ji-Hoon Kim , Yeunju Choi , Tan Dat Nguyen , Seongkyu Mun , Joon Son Chung

In this paper, we propose a novel voice conversion strategy to resolve the mismatch between the training and conversion scenarios when parallel speech corpus is unavailable for training. Based on auto-encoder and disentanglement frameworks,…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-05 Yoohwan Kwon , Soo-Whan Chung , Hee-Soo Heo , Hong-Goo Kang

Traditional voice conversion (VC) methods typically attempt to separate speaker identity and linguistic information into distinct representations, which are then combined to reconstruct the audio. However, effectively disentangling these…

Sound · Computer Science 2025-10-13 Huu Tuong Tu , Huan Vu , cuong tien nguyen , Dien Hy Ngo , Nguyen Thi Thu Trang

Nowadays, recognition-synthesis-based methods have been quite popular with voice conversion (VC). By introducing linguistics features with good disentangling characters extracted from an automatic speech recognition (ASR) model, the VC…

Sound · Computer Science 2023-05-17 Xintao Zhao , Shuai Wang , Yang Chao , Zhiyong Wu , Helen Meng

Voice conversion refers to transferring speaker identity with well-preserved content. Better disentanglement of speech representations leads to better voice conversion. Recent studies have found that phonetic information from input audio…

Sound · Computer Science 2024-01-19 Yimin Deng , Huaizhen Tang , Xulong Zhang , Ning Cheng , Jing Xiao , Jianzong Wang

In this work, we propose a zero-shot voice conversion method using speech representations trained with self-supervised learning. First, we develop a multi-task model to decompose a speech utterance into features such as linguistic content,…

Sound · Computer Science 2023-02-17 Shehzeen Hussain , Paarth Neekhara , Jocelyn Huang , Jason Li , Boris Ginsburg

Voice Conversion (VC) modifies speech to match a target speaker while preserving linguistic content. Traditional methods usually extract speaker information directly from speech while neglecting the explicit utilization of linguistic…

Multimedia · Computer Science 2025-06-04 Fengjin Li , Jie Wang , Yadong Niu , Yongqing Wang , Meng Meng , Jian Luan , Zhiyong Wu

Voice Conversion (VC) for unseen speakers, also known as zero-shot VC, is an attractive research topic as it enables a range of applications like voice customizing, animation production, and others. Recent work in this area made progress…

Sound · Computer Science 2022-06-01 Shijun Wang , Dimche Kostadinov , Damian Borth

Most current zero-shot voice conversion methods rely on externally supervised components, particularly speaker encoders, for training. To explore alternatives that eliminate this dependency, this paper introduces GenVC, a novel framework…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-21 Zexin Cai , Henry Li Xinyuan , Ashi Garg , Leibny Paola García-Perera , Kevin Duh , Sanjeev Khudanpur , Matthew Wiesner , Nicholas Andrews

Any-to-any voice conversion problem aims to convert voices for source and target speakers, which are out of the training data. Previous works wildly utilize the disentangle-based models. The disentangle-based model assumes the speech…

Sound · Computer Science 2022-02-23 Qiqi Wang , Xulong Zhang , Jianzong Wang , Ning Cheng , Jing Xiao

Currently, zero-shot voice conversion systems are capable of synthesizing the voice of unseen speakers. However, most existing approaches struggle to accurately replicate the speaking style of the source speaker or mimic the distinctive…

Sound · Computer Science 2025-06-02 Kaidi Wang , Wenhao Guan , Ziyue Jiang , Hukai Huang , Peijie Chen , Weijie Wu , Qingyang Hong , Lin Li

Identity, accent, style, and emotions are essential components of human speech. Voice conversion (VC) techniques process the speech signals of two input speakers and other modalities of auxiliary information such as prompts and emotion…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-09 Xining Song , Zhihua Wei , Rui Wang , Haixiao Hu , Yanxiang Chen , Meng Han

Face-based Voice Conversion (FVC) is a novel task that leverages facial images to generate the target speaker's voice style. Previous work has two shortcomings: (1) suffering from obtaining facial embeddings that are well-aligned with the…

Sound · Computer Science 2024-09-05 Yan Rong , Li Liu

Large Language Models (LLMs) are one of the most promising technologies for the next era of speech generation systems, due to their scalability and in-context learning capabilities. Nevertheless, they suffer from multiple stability issues…

Voice conversion (VC) modifies voice characteristics while preserving linguistic content. This paper presents the Stepback network, a novel model for converting speaker identity using non-parallel data. Unlike traditional VC methods that…

Sound · Computer Science 2025-01-28 Qian Yang , Calbert Graham

Emotional voice conversion (VC) aims to convert a neutral voice to an emotional (e.g. happy) one while retaining the linguistic information and speaker identity. We note that the decoupling of emotional features from other speech…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-05 Zhaojie Luo , Shoufeng Lin , Rui Liu , Jun Baba , Yuichiro Yoshikawa , Ishiguro Hiroshi

Voice conversion (VC) is a task that transforms voice from target audio to source without losing linguistic contents, it is challenging especially when source and target speakers are unseen during training (zero-shot VC). Previous…

Sound · Computer Science 2021-04-14 Shijun Wang , Damian Borth

Recently, voice conversion (VC) has been widely studied. Many VC systems use disentangle-based learning techniques to separate the speaker and the linguistic content information from a speech signal. Subsequently, they convert the voice by…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-03 Yen-Hao Chen , Da-Yi Wu , Tsung-Han Wu , Hung-yi Lee
‹ Prev 1 2 3 10 Next ›