Related papers: Self-Supervised Representations for Singing Voice …

Singer Identity Representation Learning using Self-Supervised Techniques

Significant strides have been made in creating voice identity representations using speech data. However, the same level of progress has not been achieved for singing voices. To bridge this gap, we suggest a framework for training singer…

Sound · Computer Science 2024-01-11 Bernardo Torres , Stefan Lattner , Gaël Richard

Unsupervised Cross-Domain Singing Voice Conversion

We present a wav-to-wav generative model for the task of singing voice conversion from any identity. Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-10 Adam Polyak , Lior Wolf , Yossi Adi , Yaniv Taigman

Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational Autoencoders

We propose a flexible framework that deals with both singer conversion and singers vocal technique conversion. The proposed model is trained on non-parallel corpora, accommodates many-to-many conversion, and leverages recent advances of…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-26 Yin-Jyun Luo , Chin-Chen Hsu , Kat Agres , Dorien Herremans

A Unified Model For Voice and Accent Conversion In Speech and Singing using Self-Supervised Learning and Feature Extraction

This paper presents a new voice conversion model capable of transforming both speaking and singing voices. It addresses key challenges in current systems, such as conveying emotions, managing pronunciation and accent changes, and…

Sound · Computer Science 2024-12-12 Sowmya Cheripally

Unsupervised Singing Voice Conversion

We present a deep learning method for singing voice conversion. The proposed network is not conditioned on the text or on the notes, and it directly converts the audio of one singer to the voice of another. Training is performed without any…

Machine Learning · Computer Science 2019-09-26 Eliya Nachmani , Lior Wolf

SYKI-SVC: Advancing Singing Voice Conversion with Post-Processing Innovations and an Open-Source Professional Testset

Singing voice conversion aims to transform a source singing voice into that of a target singer while preserving the original lyrics, melody, and various vocal techniques. In this paper, we propose a high-fidelity singing voice conversion…

Sound · Computer Science 2025-01-07 Yiquan Zhou , Wenyu Wang , Hongwu Ding , Jiacheng Xu , Jihua Zhu , Xin Gao , Shihao Li

Singing voice conversion with non-parallel data

Singing voice conversion is a task to convert a song sang by a source singer to the voice of a target singer. In this paper, we propose using a parallel data free, many-to-one voice conversion technique on singing voices. A phonetic…

Audio and Speech Processing · Electrical Eng. & Systems 2019-03-12 Xin Chen , Wei Chu , Jinxi Guo , Ning Xu

SelfVC: Voice Conversion With Iterative Refinement using Self Transformations

We propose SelfVC, a training strategy to iteratively improve a voice conversion model with self-synthesized examples. Previous efforts on voice conversion focus on factorizing speech into explicitly disentangled representations that…

Sound · Computer Science 2024-05-06 Paarth Neekhara , Shehzeen Hussain , Rafael Valle , Boris Ginsburg , Rishabh Ranjan , Shlomo Dubnov , Farinaz Koushanfar , Julian McAuley

VAW-GAN for Singing Voice Conversion with Non-parallel Training Data

Singing voice conversion aims to convert singer's voice from source to target without changing singing content. Parallel training data is typically required for the training of singing voice conversion system, that is however not practical…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-04 Junchen Lu , Kun Zhou , Berrak Sisman , Haizhou Li

Real-Time and Accurate: Zero-shot High-Fidelity Singing Voice Conversion with Multi-Condition Flow Synthesis

Singing voice conversion is to convert the source singing voice into the target singing voice except for the content. Currently, flow-based models can complete the task of voice conversion, but they struggle to effectively extract latent…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-10 Hui Li , Hongyu Wang , Zhijin Chen , Bohan Sun , Bo Li

R2-SVC: Towards Real-World Robust and Expressive Zero-shot Singing Voice Conversion

In real-world singing voice conversion (SVC) applications, environmental noise and the demand for expressive output pose significant challenges. Conventional methods, however, are typically designed without accounting for real deployment…

Sound · Computer Science 2025-10-24 Junjie Zheng , Gongyu Chen , Chaofan Ding , Zihao Chen

SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion

Diffusion-based singing voice conversion (SVC) models have shown better synthesis quality compared to traditional methods. However, in cross-domain SVC scenarios, where there is a significant disparity in pitch between the source and target…

Sound · Computer Science 2024-06-12 Bingsong Bai , Fengping Wang , Yingming Gao , Ya Li

Singing Voice Conversion with Accompaniment Using Self-Supervised Representation-Based Melody Features

Melody preservation is crucial in singing voice conversion (SVC). However, in many scenarios, audio is often accompanied with background music (BGM), which can cause audio distortion and interfere with the extraction of melody and other key…

Sound · Computer Science 2025-02-10 Wei Chen , Binzhu Sha , Jing Yang , Zhuo Wang , Fan Fan , Zhiyong Wu

VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation

Singing Voice Synthesis (SVS) has witnessed significant advancements with the advent of deep learning techniques. However, a significant challenge in SVS is the scarcity of labeled singing voice data, which limits the effectiveness of…

Sound · Computer Science 2024-12-17 Yifeng Yu , Jiatong Shi , Yuning Wu , Yuxun Tang , Shinji Watanabe

Unsupervised Interpretable Representation Learning for Singing Voice Separation

In this work, we present a method for learning interpretable music signal representations directly from waveform signals. Our method can be trained using unsupervised objectives and relies on the denoising auto-encoder model that uses a…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-02 Stylianos I. Mimilakis , Konstantinos Drossos , Gerald Schuller

VITS-Based Singing Voice Conversion Leveraging Whisper and multi-scale F0 Modeling

This paper introduces the T23 team's system submitted to the Singing Voice Conversion Challenge 2023. Following the recognition-synthesis framework, our singing conversion model is based on VITS, incorporating four key modules: a prior…

Audio and Speech Processing · Electrical Eng. & Systems 2023-10-05 Ziqian Ning , Yuepeng Jiang , Zhichao Wang , Bin Zhang , Lei Xie

PitchNet: Unsupervised Singing Voice Conversion with Pitch Adversarial Network

Singing voice conversion is to convert a singer's voice to another one's voice without changing singing content. Recent work shows that unsupervised singing voice conversion can be achieved with an autoencoder-based approach [1]. However,…

Sound · Computer Science 2020-02-19 Chengqi Deng , Chengzhu Yu , Heng Lu , Chao Weng , Dong Yu

VibE-SVC: Vibrato Extraction with High-frequency F0 Contour for Singing Voice Conversion

Controlling singing style is crucial for achieving an expressive and natural singing voice. Among the various style factors, vibrato plays a key role in conveying emotions and enhancing musical depth. However, modeling vibrato remains…

Sound · Computer Science 2025-10-07 Joon-Seung Choi , Dong-Min Byun , Hyung-Seok Oh , Seong-Whan Lee

Towards High-fidelity Singing Voice Conversion with Acoustic Reference and Contrastive Predictive Coding

Recently, phonetic posteriorgrams (PPGs) based methods have been quite popular in non-parallel singing voice conversion systems. However, due to the lack of acoustic information in PPGs, style and naturalness of the converted singing voices…

Sound · Computer Science 2021-10-12 Chao Wang , Zhonghao Li , Benlai Tang , Xiang Yin , Yuan Wan , Yibiao Yu , Zejun Ma

Singing Voice Synthesis Using Deep Autoregressive Neural Networks for Acoustic Modeling

This paper presents a method of using autoregressive neural networks for the acoustic modeling of singing voice synthesis (SVS). Singing voice differs from speech and it contains more local dynamic movements of acoustic features, e.g.,…

Sound · Computer Science 2019-06-24 Yuan-Hao Yi , Yang Ai , Zhen-Hua Ling , Li-Rong Dai