Related papers: GenVC: Self-Supervised Zero-Shot Voice Conversion

Training Robust Zero-Shot Voice Conversion Models with Self-supervised Features

Unsupervised Zero-Shot Voice Conversion (VC) aims to modify the speaker characteristic of an utterance to match an unseen target speaker without relying on parallel training data. Recently, self-supervised learning of speech representation…

Sound · Computer Science 2022-02-14 Trung Dang , Dung Tran , Peter Chin , Kazuhito Koishida

Investigating self-supervised features for expressive, multilingual voice conversion

Voice conversion (VC) systems are widely used for several applications, from speaker anonymisation to personalised speech synthesis. Supervised approaches learn a mapping between different speakers using parallel data, which is expensive to…

Audio and Speech Processing · Electrical Eng. & Systems 2025-05-14 Álvaro Martín-Cortinas , Daniel Sáez-Trigueros , Grzegorz Beringer , Iván Vallés-Pérez , Roberto Barra-Chicote , Biel Tura-Vecino , Adam Gabryś , Piotr Bilinski , Thomas Merritt , Jaime Lorenzo-Trueba

Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion of Whispered and Regular Speech

Zero-shot voice conversion aims to transfer the voice of a source speaker to that of a speaker unseen during training, while preserving the content information. Although various methods have been proposed to reconstruct speaker information…

Sound · Computer Science 2024-08-22 Anastasia Avdeeva , Aleksei Gusev

EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion

Voice Conversion research in recent times has increasingly focused on improving the zero-shot capabilities of existing methods. Despite remarkable advancements, current architectures still tend to struggle in zero-shot cross-lingual…

Sound · Computer Science 2025-05-26 Advait Joglekar , Divyanshu Singh , Rooshil Rohit Bhatia , S. Umesh

Zero-shot Voice Conversion via Self-supervised Prosody Representation Learning

Voice Conversion (VC) for unseen speakers, also known as zero-shot VC, is an attractive research topic as it enables a range of applications like voice customizing, animation production, and others. Recent work in this area made progress…

Sound · Computer Science 2022-06-01 Shijun Wang , Dimche Kostadinov , Damian Borth

SelfVC: Voice Conversion With Iterative Refinement using Self Transformations

We propose SelfVC, a training strategy to iteratively improve a voice conversion model with self-synthesized examples. Previous efforts on voice conversion focus on factorizing speech into explicitly disentangled representations that…

Sound · Computer Science 2024-05-06 Paarth Neekhara , Shehzeen Hussain , Rafael Valle , Boris Ginsburg , Rishabh Ranjan , Shlomo Dubnov , Farinaz Koushanfar , Julian McAuley

AdaptVC: High Quality Voice Conversion with Adaptive Learning

The goal of voice conversion is to transform the speech of a source speaker to sound like that of a reference speaker while preserving the original content. A key challenge is to extract disentangled linguistic content from the source and…

Sound · Computer Science 2025-01-15 Jaehun Kim , Ji-Hoon Kim , Yeunju Choi , Tan Dat Nguyen , Seongkyu Mun , Joon Son Chung

SIG-VC: A Speaker Information Guided Zero-shot Voice Conversion System for Both Human Beings and Machines

Nowadays, as more and more systems achieve good performance in traditional voice conversion (VC) tasks, people's attention gradually turns to VC tasks under extreme conditions. In this paper, we propose a novel method for zero-shot voice…

Sound · Computer Science 2023-04-04 Haozhe Zhang , Zexin Cai , Xiaoyi Qin , Ming Li

StarGAN-ZSVC: Towards Zero-Shot Voice Conversion in Low-Resource Contexts

Voice conversion is the task of converting a spoken utterance from a source speaker so that it appears to be said by a different target speaker while retaining the linguistic content of the utterance. Recent advances have led to major…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-02 Matthew Baas , Herman Kamper

ConVoice: Real-Time Zero-Shot Voice Style Transfer with Convolutional Network

We propose a neural network for zero-shot voice conversion (VC) without any parallel or transcribed data. Our approach uses pre-trained models for automatic speech recognition (ASR) and speaker embedding, obtained from a speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-19 Yurii Rebryk , Stanislav Beliaev

ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations

In this work, we propose a zero-shot voice conversion method using speech representations trained with self-supervised learning. First, we develop a multi-task model to decompose a speech utterance into features such as linguistic content,…

Sound · Computer Science 2023-02-17 Shehzeen Hussain , Paarth Neekhara , Jocelyn Huang , Jason Li , Boris Ginsburg

Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments

Voice Conversion (VC) is a technique that aims to transform the non-linguistic information of a source utterance to change the perceived identity of the speaker. While there is a rich literature on VC, most proposed methods are trained and…

Sound · Computer Science 2021-06-17 Alejandro Mottini , Jaime Lorenzo-Trueba , Sri Vishnu Kumar Karlapati , Thomas Drugman

Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment

This paper presents a novel task, zero-shot voice conversion based on face images (zero-shot FaceVC), which aims at converting the voice characteristics of an utterance from any source speaker to a newly coming target speaker, solely…

Sound · Computer Science 2023-09-19 Zheng-Yan Sheng , Yang Ai , Yan-Nian Chen , Zhen-Hua Ling

StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion

Voice Conversion (VC) modifies speech to match a target speaker while preserving linguistic content. Traditional methods usually extract speaker information directly from speech while neglecting the explicit utilization of linguistic…

Multimedia · Computer Science 2025-06-04 Fengjin Li , Jie Wang , Yadong Niu , Yongqing Wang , Meng Meng , Jian Luan , Zhiyong Wu

GAZEV: GAN-Based Zero-Shot Voice Conversion over Non-parallel Speech Corpus

Non-parallel many-to-many voice conversion is recently attract-ing huge research efforts in the speech processing community. A voice conversion system transforms an utterance of a source speaker to another utterance of a target speaker by…

Sound · Computer Science 2020-10-27 Zining Zhang , Bingsheng He , Zhenjie Zhang

AGAIN-VC: A One-shot Voice Conversion using Activation Guidance and Adaptive Instance Normalization

Recently, voice conversion (VC) has been widely studied. Many VC systems use disentangle-based learning techniques to separate the speaker and the linguistic content information from a speech signal. Subsequently, they convert the voice by…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-03 Yen-Hao Chen , Da-Yi Wu , Tsung-Han Wu , Hung-yi Lee

NoiseVC: Towards High Quality Zero-Shot Voice Conversion

Voice conversion (VC) is a task that transforms voice from target audio to source without losing linguistic contents, it is challenging especially when source and target speakers are unseen during training (zero-shot VC). Previous…

Sound · Computer Science 2021-04-14 Shijun Wang , Damian Borth

End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions

Zero-shot voice conversion is becoming an increasingly popular research topic, as it promises the ability to transform speech to sound like any speaker. However, relatively little work has been done on end-to-end methods for this task,…

Audio and Speech Processing · Electrical Eng. & Systems 2024-04-04 Wonjune Kang , Mark Hasegawa-Johnson , Deb Roy

Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy

Zero-shot voice conversion (VC) aims to transform source speech into arbitrary unseen target voice while keeping the linguistic content unchanged. Recent VC methods have made significant progress, but semantic losses in the decoupling…

Sound · Computer Science 2024-06-17 Linhan Ma , Xinfa Zhu , Yuanjun Lv , Zhichao Wang , Ziqian Wang , Wendi He , Hongbin Zhou , Lei Xie

X-VC: Zero-shot Streaming Voice Conversion in Codec Space

Zero-shot voice conversion (VC) aims to convert a source utterance into the voice of an unseen target speaker while preserving its linguistic content. Although recent systems have improved conversion quality, building zero-shot VC systems…

Audio and Speech Processing · Electrical Eng. & Systems 2026-04-23 Qixi Zheng , Yuxiang Zhao , Tianrui Wang , Wenxi Chen , Kele Xu , Yikang Li , Qinyuan Chen , Xipeng Qiu , Kai Yu , Xie Chen