Related papers: Zero-shot Singing Technique Conversion

Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational Autoencoders

We propose a flexible framework that deals with both singer conversion and singers vocal technique conversion. The proposed model is trained on non-parallel corpora, accommodates many-to-many conversion, and leverages recent advances of…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-26 Yin-Jyun Luo , Chin-Chen Hsu , Kat Agres , Dorien Herremans

Everyone-Can-Sing: Zero-Shot Singing Voice Synthesis and Conversion with Speech Reference

We propose a unified framework for Singing Voice Synthesis (SVS) and Conversion (SVC), addressing the limitations of existing approaches in cross-domain SVS/SVC, poor output musicality, and scarcity of singing data. Our framework enables…

Sound · Computer Science 2025-01-24 Shuqi Dai , Yunyun Wang , Roger B. Dannenberg , Zeyu Jin

Unsupervised Singing Voice Conversion

We present a deep learning method for singing voice conversion. The proposed network is not conditioned on the text or on the notes, and it directly converts the audio of one singer to the voice of another. Training is performed without any…

Machine Learning · Computer Science 2019-09-26 Eliya Nachmani , Lior Wolf

Singing voice conversion with non-parallel data

Singing voice conversion is a task to convert a song sang by a source singer to the voice of a target singer. In this paper, we propose using a parallel data free, many-to-one voice conversion technique on singing voices. A phonetic…

Audio and Speech Processing · Electrical Eng. & Systems 2019-03-12 Xin Chen , Wei Chu , Jinxi Guo , Ning Xu

PitchNet: Unsupervised Singing Voice Conversion with Pitch Adversarial Network

Singing voice conversion is to convert a singer's voice to another one's voice without changing singing content. Recent work shows that unsupervised singing voice conversion can be achieved with an autoencoder-based approach [1]. However,…

Sound · Computer Science 2020-02-19 Chengqi Deng , Chengzhu Yu , Heng Lu , Chao Weng , Dong Yu

AdaptVC: High Quality Voice Conversion with Adaptive Learning

The goal of voice conversion is to transform the speech of a source speaker to sound like that of a reference speaker while preserving the original content. A key challenge is to extract disentangled linguistic content from the source and…

Sound · Computer Science 2025-01-15 Jaehun Kim , Ji-Hoon Kim , Yeunju Choi , Tan Dat Nguyen , Seongkyu Mun , Joon Son Chung

A Comparative Analysis Of Latent Regressor Losses For Singing Voice Conversion

Previous research has shown that established techniques for spoken voice conversion (VC) do not perform as well when applied to singing voice conversion (SVC). We propose an alternative loss component in a loss function that is otherwise…

Sound · Computer Science 2023-02-28 Brendan O'Connor , Simon Dixon

Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme representations

This study presents an innovative Zero-Shot any-to-any Singing Voice Conversion (SVC) method, leveraging a novel clustering-based phoneme representation to effectively separate content, timbre, and singing style. This approach enables…

Sound · Computer Science 2024-10-15 Wangjin Zhou , Fengrun Zhang , Yiming Liu , Wenhao Guan , Yi Zhao , Tatsuya Kawahara

SingIt! Singer Voice Transformation

In this paper, we propose a model which can generate a singing voice from normal speech utterance by harnessing zero-shot, many-to-many style transfer learning. Our goal is to give anyone the opportunity to sing any song in a timely manner.…

Audio and Speech Processing · Electrical Eng. & Systems 2024-05-09 Amit Eliav , Aaron Taub , Renana Opochinsky , Sharon Gannot

HQ-SVC: Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios

Zero-shot singing voice conversion (SVC) transforms a source singer's timbre to an unseen target speaker's voice while preserving melodic content without fine-tuning. Existing methods model speaker timbre and vocal content separately,…

Sound · Computer Science 2025-11-18 Bingsong Bai , Yizhong Geng , Fengping Wang , Cong Wang , Puyuan Guo , Yingming Gao , Ya Li

SYKI-SVC: Advancing Singing Voice Conversion with Post-Processing Innovations and an Open-Source Professional Testset

Singing voice conversion aims to transform a source singing voice into that of a target singer while preserving the original lyrics, melody, and various vocal techniques. In this paper, we propose a high-fidelity singing voice conversion…

Sound · Computer Science 2025-01-07 Yiquan Zhou , Wenyu Wang , Hongwu Ding , Jiacheng Xu , Jihua Zhu , Xin Gao , Shihao Li

Training Robust Zero-Shot Voice Conversion Models with Self-supervised Features

Unsupervised Zero-Shot Voice Conversion (VC) aims to modify the speaker characteristic of an utterance to match an unseen target speaker without relying on parallel training data. Recently, self-supervised learning of speech representation…

Sound · Computer Science 2022-02-14 Trung Dang , Dung Tran , Peter Chin , Kazuhito Koishida

Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion of Whispered and Regular Speech

Zero-shot voice conversion aims to transfer the voice of a source speaker to that of a speaker unseen during training, while preserving the content information. Although various methods have been proposed to reconstruct speaker information…

Sound · Computer Science 2024-08-22 Anastasia Avdeeva , Aleksei Gusev

YingMusic-SVC: Real-World Robust Zero-Shot Singing Voice Conversion with Flow-GRPO and Singing-Specific Inductive Biases

Singing voice conversion (SVC) aims to render the target singer's timbre while preserving melody and lyrics. However, existing zero-shot SVC systems remain fragile in real songs due to harmony interference, F0 errors, and the lack of…

Sound · Computer Science 2025-12-05 Gongyu Chen , Xiaoyu Zhang , Zhenqiang Weng , Junjie Zheng , Da Shen , Chaofan Ding , Wei-Qiang Zhang , Zihao Chen

NoiseVC: Towards High Quality Zero-Shot Voice Conversion

Voice conversion (VC) is a task that transforms voice from target audio to source without losing linguistic contents, it is challenging especially when source and target speakers are unseen during training (zero-shot VC). Previous…

Sound · Computer Science 2021-04-14 Shijun Wang , Damian Borth

Self-Supervised Representations for Singing Voice Conversion

A singing voice conversion model converts a song in the voice of an arbitrary source singer to the voice of a target singer. Recently, methods that leverage self-supervised audio representations such as HuBERT and Wav2Vec 2.0 have helped…

Audio and Speech Processing · Electrical Eng. & Systems 2023-03-23 Tejas Jayashankar , Jilong Wu , Leda Sari , David Kant , Vimal Manohar , Qing He

SelfVC: Voice Conversion With Iterative Refinement using Self Transformations

We propose SelfVC, a training strategy to iteratively improve a voice conversion model with self-synthesized examples. Previous efforts on voice conversion focus on factorizing speech into explicitly disentangled representations that…

Sound · Computer Science 2024-05-06 Paarth Neekhara , Shehzeen Hussain , Rafael Valle , Boris Ginsburg , Rishabh Ranjan , Shlomo Dubnov , Farinaz Koushanfar , Julian McAuley

VAW-GAN for Singing Voice Conversion with Non-parallel Training Data

Singing voice conversion aims to convert singer's voice from source to target without changing singing content. Parallel training data is typically required for the training of singing voice conversion system, that is however not practical…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-04 Junchen Lu , Kun Zhou , Berrak Sisman , Haizhou Li

Serenade: A Singing Style Conversion Framework Based On Audio Infilling

We propose Serenade, a novel framework for the singing style conversion (SSC) task. Although singer identity conversion has made great strides in the previous years, converting the singing style of a singer has been an unexplored research…

Sound · Computer Science 2025-07-08 Lester Phillip Violeta , Wen-Chin Huang , Tomoki Toda

DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion System

Singing voice conversion is converting the timbre in the source singing to the target speaker's voice while keeping singing content the same. However, singing data for target speaker is much more difficult to collect compared with normal…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-10 Liqiang Zhang , Chengzhu Yu , Heng Lu , Chao Weng , Chunlei Zhang , Yusong Wu , Xiang Xie , Zijin Li , Dong Yu