Related papers: Singer Identity Representation Learning using Self…

VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation

Singing Voice Synthesis (SVS) has witnessed significant advancements with the advent of deep learning techniques. However, a significant challenge in SVS is the scarcity of labeled singing voice data, which limits the effectiveness of…

Sound · Computer Science 2024-12-17 Yifeng Yu , Jiatong Shi , Yuning Wu , Yuxun Tang , Shinji Watanabe

Self-Supervised Representations for Singing Voice Conversion

A singing voice conversion model converts a song in the voice of an arbitrary source singer to the voice of a target singer. Recently, methods that leverage self-supervised audio representations such as HuBERT and Wav2Vec 2.0 have helped…

Audio and Speech Processing · Electrical Eng. & Systems 2023-03-23 Tejas Jayashankar , Jilong Wu , Leda Sari , David Kant , Vimal Manohar , Qing He

Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational Autoencoders

We propose a flexible framework that deals with both singer conversion and singers vocal technique conversion. The proposed model is trained on non-parallel corpora, accommodates many-to-many conversion, and leverages recent advances of…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-26 Yin-Jyun Luo , Chin-Chen Hsu , Kat Agres , Dorien Herremans

Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case Studies

Automatic singing voice understanding tasks, such as singer identification, singing voice transcription, and singing technique classification, benefit from data-driven approaches that utilize deep learning techniques. These approaches work…

Sound · Computer Science 2023-09-06 Yuya Yamamoto

Learning a Joint Embedding Space of Monophonic and Mixed Music Signals for Singing Voice

Previous approaches in singer identification have used one of monophonic vocal tracks or mixed tracks containing multiple instruments, leaving a semantic gap between these two domains of audio. In this paper, we present a system to learn a…

Sound · Computer Science 2019-06-27 Kyungyun Lee , Juhan Nam

Adversarially Trained Multi-Singer Sequence-To-Sequence Singing Synthesizer

This paper presents a high quality singing synthesizer that is able to model a voice with limited available recordings. Based on the sequence-to-sequence singing model, we design a multi-singer framework to leverage all the existing singing…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-19 Jie Wu , Jian Luan

Singing Beat Tracking With Self-supervised Front-end and Linear Transformers

Tracking beats of singing voices without the presence of musical accompaniment can find many applications in music production, automatic song arrangement, and social media interaction. Its main challenge is the lack of strong rhythmic and…

Audio and Speech Processing · Electrical Eng. & Systems 2022-09-01 Mojtaba Heydari , Zhiyao Duan

Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher

Building a high-quality singing corpus for a person who is not good at singing is non-trivial, thus making it challenging to create a singing voice synthesizer for this person. Learn2Sing is dedicated to synthesizing the singing voice of a…

Sound · Computer Science 2022-05-27 Heyang Xue , Xinsheng Wang , Yongmao Zhang , Lei Xie , Pengcheng Zhu , Mengxiao Bi

Semi-supervised Learning for Singing Synthesis Timbre

We propose a semi-supervised singing synthesizer, which is able to learn new voices from audio data only, without any annotations such as phonetic segmentation. Our system is an encoder-decoder model with two encoders, linguistic and…

Sound · Computer Science 2020-11-06 Jordi Bonada , Merlijn Blaauw

An Empirical Study on End-to-End Singing Voice Synthesis with Encoder-Decoder Architectures

With the rapid development of neural network architectures and speech processing models, singing voice synthesis with neural networks is becoming the cutting-edge technique of digital music production. In this work, in order to explore how…

Sound · Computer Science 2021-08-29 Dengfeng Ke , Yuxing Lu , Xudong Liu , Yanyan Xu , Jing Sun , Cheng-Hao Cai

Deep Audio-Visual Singing Voice Transcription based on Self-Supervised Learning Models

Singing voice transcription converts recorded singing audio to musical notation. Sound contamination (such as accompaniment) and lack of annotated data make singing voice transcription an extremely difficult task. We take two approaches to…

Sound · Computer Science 2023-04-25 Xiangming Gu , Wei Zeng , Jianan Zhang , Longshen Ou , Ye Wang

Visually Guided Self Supervised Learning of Speech Representations

Self supervised representation learning has recently attracted a lot of research interest for both the audio and visual modalities. However, most works typically focus on a particular modality or feature alone and there has been very…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-21 Abhinav Shukla , Konstantinos Vougioukas , Pingchuan Ma , Stavros Petridis , Maja Pantic

Unsupervised Singing Voice Conversion

We present a deep learning method for singing voice conversion. The proposed network is not conditioned on the text or on the notes, and it directly converts the audio of one singer to the voice of another. Training is performed without any…

Machine Learning · Computer Science 2019-09-26 Eliya Nachmani , Lior Wolf

Unsupervised Interpretable Representation Learning for Singing Voice Separation

In this work, we present a method for learning interpretable music signal representations directly from waveform signals. Our method can be trained using unsupervised objectives and relies on the denoising auto-encoder model that uses a…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-02 Stylianos I. Mimilakis , Konstantinos Drossos , Gerald Schuller

Synthetic Singers: A Review of Deep-Learning-based Singing Voice Synthesis Approaches

Recent advances in singing voice synthesis (SVS) have attracted substantial attention from both academia and industry. With the advent of large language models and novel generative paradigms, producing controllable, high-fidelity singing…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-22 Changhao Pan , Dongyu Yao , Yu Zhang , Wenxiang Guo , Jingyu Lu , Zhiyuan Zhu , Zhou Zhao

Wav2vec-C: A Self-supervised Model for Speech Representation Learning

Wav2vec-C introduces a novel representation learning technique combining elements from wav2vec 2.0 and VQ-VAE. Our model learns to reproduce quantized representations from partially masked speech encoding using a contrastive loss in a way…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-25 Samik Sadhu , Di He , Che-Wei Huang , Sri Harish Mallidi , Minhua Wu , Ariya Rastrow , Andreas Stolcke , Jasha Droppo , Roland Maas

Exploring wav2vec 2.0 on speaker verification and language identification

Wav2vec 2.0 is a recently proposed self-supervised framework for speech representation learning. It follows a two-stage training process of pre-training and fine-tuning, and performs well in speech recognition tasks especially ultra-low…

Sound · Computer Science 2021-01-15 Zhiyun Fan , Meng Li , Shiyu Zhou , Bo Xu

Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

Learning good representations without supervision is still an open issue in machine learning, and is particularly challenging for speech signals, which are often characterized by long sequences with a complex hierarchical structure. Some…

Machine Learning · Computer Science 2019-04-09 Santiago Pascual , Mirco Ravanelli , Joan Serrà , Antonio Bonafonte , Yoshua Bengio

A Survey on Recent Deep Learning-driven Singing Voice Synthesis Systems

Singing voice synthesis (SVS) is a task that aims to generate audio signals according to musical scores and lyrics. With its multifaceted nature concerning music and language, producing singing voices indistinguishable from that of human…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-07 Yin-Ping Cho , Fu-Rong Yang , Yung-Chuan Chang , Ching-Ting Cheng , Xiao-Han Wang , Yi-Wen Liu

Evaluating Speaker Identity Coding in Self-supervised Models and Humans

Speaker identity plays a significant role in human communication and is being increasingly used in societal applications, many through advances in machine learning. Speaker identity perception is an essential cognitive phenomenon that can…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-18 Gasser Elbanna