English
Related papers

Related papers: SelfVC: Voice Conversion With Iterative Refinement…

200 papers

Previous research has shown that established techniques for spoken voice conversion (VC) do not perform as well when applied to singing voice conversion (SVC). We propose an alternative loss component in a loss function that is otherwise…

Sound · Computer Science 2023-02-28 Brendan O'Connor , Simon Dixon

Significant strides have been made in creating voice identity representations using speech data. However, the same level of progress has not been achieved for singing voices. To bridge this gap, we suggest a framework for training singer…

Sound · Computer Science 2024-01-11 Bernardo Torres , Stefan Lattner , Gaël Richard

Voice Conversion (VC) is a technique that aims to transform the non-linguistic information of a source utterance to change the perceived identity of the speaker. While there is a rich literature on VC, most proposed methods are trained and…

In voice conversion (VC), it is crucial to preserve complete semantic information while accurately modeling the target speaker's timbre and prosody. This paper proposes FabasedVC to achieve VC with enhanced similarity in timbre, prosody,…

Sound · Computer Science 2025-11-14 Wenyu Wang , Zhetao Hu , Yiquan Zhou , Jiacheng Xu , Zhiyu Wu , Chen Li , Shihao Li

Singing Voice Synthesis (SVS) has witnessed significant advancements with the advent of deep learning techniques. However, a significant challenge in SVS is the scarcity of labeled singing voice data, which limits the effectiveness of…

Sound · Computer Science 2024-12-17 Yifeng Yu , Jiatong Shi , Yuning Wu , Yuxun Tang , Shinji Watanabe

We present a novel approach to any-to-one (A2O) voice conversion (VC) in a sequence-to-sequence (seq2seq) framework. A2O VC aims to convert any speaker, including those unseen during training, to a fixed target speaker. We utilize…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-26 Wen-Chin Huang , Yi-Chiao Wu , Tomoki Hayashi , Tomoki Toda

Recently, cycle-consistent adversarial network (Cycle-GAN) has been successfully applied to voice conversion to a different speaker without parallel data, although in those approaches an individual model is needed for each target speaker.…

Audio and Speech Processing · Electrical Eng. & Systems 2018-06-26 Ju-chieh Chou , Cheng-chieh Yeh , Hung-yi Lee , Lin-shan Lee

Voice Conversion(VC) refers to changing the timbre of a speech while retaining the discourse content. Recently, many works have focused on disentangle-based learning techniques to separate the timbre and the linguistic content information…

Sound · Computer Science 2022-02-22 Huaizhen Tang , Xulong Zhang , Jianzong Wang , Ning Cheng , Jing Xiao

Typically, singing voice conversion (SVC) depends on an embedding vector, extracted from either a speaker lookup table (LUT) or a speaker recognition network (SRN), to model speaker identity. However, singing contains more expressive…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-07 Xu Li , Shansong Liu , Ying Shan

Voice imitation aims to transform source speech to match a reference speaker's timbre and speaking style while preserving linguistic content. A straightforward approach is to train on triplets of (source, reference, target), where source…

Sound · Computer Science 2026-04-21 Tao Feng , Yuxiang Wang , Yuancheng Wang , Xueyao Zhang , Dekun Chen , Chaoren Wang , Xun Guan , Zhizheng Wu

Building cross-lingual voice conversion (VC) systems for multiple speakers and multiple languages has been a challenging task for a long time. This paper describes a parallel non-autoregressive network to achieve bilingual and code-switched…

Audio and Speech Processing · Electrical Eng. & Systems 2021-04-23 Yaogen Yang , Haozhe Zhang , Xiaoyi Qin , Shanshan Liang , Huahua Cui , Mingyang Xu , Ming Li

Disentangled representation learning aims to extract explanatory features or factors and retain salient information. Factorized hierarchical variational autoencoder (FHVAE) presents a way to disentangle a speech signal into sequential-level…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-06 Yuying Xie , Thomas Arildsen , Zheng-Hua Tan

This paper introduces a novel voice conversion (VC) model, guided by text instructions such as "articulate slowly with a deep tone" or "speak in a cheerful boyish voice". Unlike traditional methods that rely on reference utterances to…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-17 Chun-Yi Kuan , Chen An Li , Tsu-Yuan Hsu , Tse-Yang Lin , Ho-Lam Chung , Kai-Wei Chang , Shuo-yiin Chang , Hung-yi Lee

Using unsupervised learning to disentangle speech into content, rhythm, pitch, and timbre for voice conversion has become a hot research topic. Existing works generally take into account disentangling speech components through human-crafted…

Sound · Computer Science 2024-05-01 Ziqi Liang , Jianzong Wang , Xulong Zhang , Yong Zhang , Ning Cheng , Jing Xiao

Voice conversion (VC) using deep learning technologies can now generate high quality one-to-many voices and thus has been used in some practical application fields, such as entertainment and healthcare. However, voice conversion can pose…

Sound · Computer Science 2024-05-02 Qiang Huang

One-shot voice conversion aims to change the timbre of any source speech to match that of the unseen target speaker with only one speech sample. Existing methods face difficulties in satisfactory speech representation disentanglement and…

Sound · Computer Science 2024-11-26 Pengcheng Li , Jianzong Wang , Xulong Zhang , Yong Zhang , Jing Xiao , Ning Cheng

We propose noise-robust voice conversion (VC) which takes into account the recording quality and environment of noisy source speech. Conventional denoising training improves the noise robustness of a VC model by learning noisy-to-clean VC…

In this paper, we focus on improving the performance of the text-dependent speaker verification system in the scenario of limited training data. The speaker verification system deep learning based text-dependent generally needs a large…

Sound · Computer Science 2020-11-24 Xiaoyi Qin , Yaogen Yang , Lin Yang , Xuyang Wang , Junjie Wang , Ming Li

The objective of this paper is to learn representations of speaker identity without access to manually annotated data. To do so, we develop a self-supervised learning objective that exploits the natural cross-modal synchrony between faces…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-05 Arsha Nagrani , Joon Son Chung , Samuel Albanie , Andrew Zisserman

Speech representation learning with self-supervised algorithms has resulted in notable performance boosts in many downstream tasks. Recent work combined self-supervised learning (SSL) and visually grounded speech (VGS) processing mechanisms…

Audio and Speech Processing · Electrical Eng. & Systems 2024-03-08 Khazar Khorrami , María Andrea Cruz Blandón , Tuomas Virtanen , Okko Räsänen
‹ Prev 1 3 4 5 6 7 10 Next ›