English
Related papers

Related papers: GenVC: Self-Supervised Zero-Shot Voice Conversion

200 papers

Unsupervised Zero-Shot Voice Conversion (VC) aims to modify the speaker characteristic of an utterance to match an unseen target speaker without relying on parallel training data. Recently, self-supervised learning of speech representation…

Sound · Computer Science 2022-02-14 Trung Dang , Dung Tran , Peter Chin , Kazuhito Koishida

Voice conversion (VC) systems are widely used for several applications, from speaker anonymisation to personalised speech synthesis. Supervised approaches learn a mapping between different speakers using parallel data, which is expensive to…

Zero-shot voice conversion aims to transfer the voice of a source speaker to that of a speaker unseen during training, while preserving the content information. Although various methods have been proposed to reconstruct speaker information…

Sound · Computer Science 2024-08-22 Anastasia Avdeeva , Aleksei Gusev

Voice Conversion research in recent times has increasingly focused on improving the zero-shot capabilities of existing methods. Despite remarkable advancements, current architectures still tend to struggle in zero-shot cross-lingual…

Sound · Computer Science 2025-05-26 Advait Joglekar , Divyanshu Singh , Rooshil Rohit Bhatia , S. Umesh

Voice Conversion (VC) for unseen speakers, also known as zero-shot VC, is an attractive research topic as it enables a range of applications like voice customizing, animation production, and others. Recent work in this area made progress…

Sound · Computer Science 2022-06-01 Shijun Wang , Dimche Kostadinov , Damian Borth

We propose SelfVC, a training strategy to iteratively improve a voice conversion model with self-synthesized examples. Previous efforts on voice conversion focus on factorizing speech into explicitly disentangled representations that…

The goal of voice conversion is to transform the speech of a source speaker to sound like that of a reference speaker while preserving the original content. A key challenge is to extract disentangled linguistic content from the source and…

Sound · Computer Science 2025-01-15 Jaehun Kim , Ji-Hoon Kim , Yeunju Choi , Tan Dat Nguyen , Seongkyu Mun , Joon Son Chung

Nowadays, as more and more systems achieve good performance in traditional voice conversion (VC) tasks, people's attention gradually turns to VC tasks under extreme conditions. In this paper, we propose a novel method for zero-shot voice…

Sound · Computer Science 2023-04-04 Haozhe Zhang , Zexin Cai , Xiaoyi Qin , Ming Li

Voice conversion is the task of converting a spoken utterance from a source speaker so that it appears to be said by a different target speaker while retaining the linguistic content of the utterance. Recent advances have led to major…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-02 Matthew Baas , Herman Kamper

We propose a neural network for zero-shot voice conversion (VC) without any parallel or transcribed data. Our approach uses pre-trained models for automatic speech recognition (ASR) and speaker embedding, obtained from a speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-19 Yurii Rebryk , Stanislav Beliaev

In this work, we propose a zero-shot voice conversion method using speech representations trained with self-supervised learning. First, we develop a multi-task model to decompose a speech utterance into features such as linguistic content,…

Sound · Computer Science 2023-02-17 Shehzeen Hussain , Paarth Neekhara , Jocelyn Huang , Jason Li , Boris Ginsburg

Voice Conversion (VC) is a technique that aims to transform the non-linguistic information of a source utterance to change the perceived identity of the speaker. While there is a rich literature on VC, most proposed methods are trained and…

This paper presents a novel task, zero-shot voice conversion based on face images (zero-shot FaceVC), which aims at converting the voice characteristics of an utterance from any source speaker to a newly coming target speaker, solely…

Sound · Computer Science 2023-09-19 Zheng-Yan Sheng , Yang Ai , Yan-Nian Chen , Zhen-Hua Ling

Voice Conversion (VC) modifies speech to match a target speaker while preserving linguistic content. Traditional methods usually extract speaker information directly from speech while neglecting the explicit utilization of linguistic…

Multimedia · Computer Science 2025-06-04 Fengjin Li , Jie Wang , Yadong Niu , Yongqing Wang , Meng Meng , Jian Luan , Zhiyong Wu

Non-parallel many-to-many voice conversion is recently attract-ing huge research efforts in the speech processing community. A voice conversion system transforms an utterance of a source speaker to another utterance of a target speaker by…

Sound · Computer Science 2020-10-27 Zining Zhang , Bingsheng He , Zhenjie Zhang

Recently, voice conversion (VC) has been widely studied. Many VC systems use disentangle-based learning techniques to separate the speaker and the linguistic content information from a speech signal. Subsequently, they convert the voice by…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-03 Yen-Hao Chen , Da-Yi Wu , Tsung-Han Wu , Hung-yi Lee

Voice conversion (VC) is a task that transforms voice from target audio to source without losing linguistic contents, it is challenging especially when source and target speakers are unseen during training (zero-shot VC). Previous…

Sound · Computer Science 2021-04-14 Shijun Wang , Damian Borth

Zero-shot voice conversion is becoming an increasingly popular research topic, as it promises the ability to transform speech to sound like any speaker. However, relatively little work has been done on end-to-end methods for this task,…

Audio and Speech Processing · Electrical Eng. & Systems 2024-04-04 Wonjune Kang , Mark Hasegawa-Johnson , Deb Roy

Zero-shot voice conversion (VC) aims to transform source speech into arbitrary unseen target voice while keeping the linguistic content unchanged. Recent VC methods have made significant progress, but semantic losses in the decoupling…

Sound · Computer Science 2024-06-17 Linhan Ma , Xinfa Zhu , Yuanjun Lv , Zhichao Wang , Ziqian Wang , Wendi He , Hongbin Zhou , Lei Xie

Zero-shot voice conversion (VC) aims to convert a source utterance into the voice of an unseen target speaker while preserving its linguistic content. Although recent systems have improved conversion quality, building zero-shot VC systems…

Audio and Speech Processing · Electrical Eng. & Systems 2026-04-23 Qixi Zheng , Yuxiang Zhao , Tianrui Wang , Wenxi Chen , Kele Xu , Yikang Li , Qinyuan Chen , Xipeng Qiu , Kai Yu , Xie Chen
‹ Prev 1 2 3 10 Next ›