Related papers: Speech-to-Singing Conversion in an Encoder-Decoder…

An Empirical Study on End-to-End Singing Voice Synthesis with Encoder-Decoder Architectures

With the rapid development of neural network architectures and speech processing models, singing voice synthesis with neural networks is becoming the cutting-edge technique of digital music production. In this work, in order to explore how…

Sound · Computer Science 2021-08-29 Dengfeng Ke , Yuxing Lu , Xudong Liu , Yanyan Xu , Jing Sun , Cheng-Hao Cai

Semi-supervised Learning for Singing Synthesis Timbre

We propose a semi-supervised singing synthesizer, which is able to learn new voices from audio data only, without any annotations such as phonetic segmentation. Our system is an encoder-decoder model with two encoders, linguistic and…

Sound · Computer Science 2020-11-06 Jordi Bonada , Merlijn Blaauw

Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational Autoencoders

We propose a flexible framework that deals with both singer conversion and singers vocal technique conversion. The proposed model is trained on non-parallel corpora, accommodates many-to-many conversion, and leverages recent advances of…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-26 Yin-Jyun Luo , Chin-Chen Hsu , Kat Agres , Dorien Herremans

Learning Singing From Speech

We propose an algorithm that is capable of synthesizing high quality target speaker's singing voice given only their normal speech samples. The proposed algorithm first integrate speech and singing synthesis into a unified framework, and…

Sound · Computer Science 2019-12-24 Liqiang Zhang , Chengzhu Yu , Heng Lu , Chao Weng , Yusong Wu , Xiang Xie , Zijin Li , Dong Yu

A Unified Model For Voice and Accent Conversion In Speech and Singing using Self-Supervised Learning and Feature Extraction

This paper presents a new voice conversion model capable of transforming both speaking and singing voices. It addresses key challenges in current systems, such as conveying emotions, managing pronunciation and accent changes, and…

Sound · Computer Science 2024-12-12 Sowmya Cheripally

Serenade: A Singing Style Conversion Framework Based On Audio Infilling

We propose Serenade, a novel framework for the singing style conversion (SSC) task. Although singer identity conversion has made great strides in the previous years, converting the singing style of a singer has been an unexplored research…

Sound · Computer Science 2025-07-08 Lester Phillip Violeta , Wen-Chin Huang , Tomoki Toda

A Recurrent Encoder-Decoder Approach with Skip-filtering Connections for Monaural Singing Voice Separation

The objective of deep learning methods based on encoder-decoder architectures for music source separation is to approximate either ideal time-frequency masks or spectral representations of the target music source(s). The spectral…

Sound · Computer Science 2018-04-25 Stylianos Ioannis Mimilakis , Konstantinos Drossos , Tuomas Virtanen , Gerald Schuller

Singer Identity Representation Learning using Self-Supervised Techniques

Significant strides have been made in creating voice identity representations using speech data. However, the same level of progress has not been achieved for singing voices. To bridge this gap, we suggest a framework for training singer…

Sound · Computer Science 2024-01-11 Bernardo Torres , Stefan Lattner , Gaël Richard

Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer

In this paper, we propose a model to perform style transfer of speech to singing voice. Contrary to the previous signal processing-based methods, which require high-quality singing templates or phoneme synchronization, we explore a…

Sound · Computer Science 2022-08-29 Shrutina Agarwal , Sriram Ganapathy , Naoya Takahashi

SingIt! Singer Voice Transformation

In this paper, we propose a model which can generate a singing voice from normal speech utterance by harnessing zero-shot, many-to-many style transfer learning. Our goal is to give anyone the opportunity to sing any song in a timely manner.…

Audio and Speech Processing · Electrical Eng. & Systems 2024-05-09 Amit Eliav , Aaron Taub , Renana Opochinsky , Sharon Gannot

DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion System

Singing voice conversion is converting the timbre in the source singing to the target speaker's voice while keeping singing content the same. However, singing data for target speaker is much more difficult to collect compared with normal…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-10 Liqiang Zhang , Chengzhu Yu , Heng Lu , Chao Weng , Chunlei Zhang , Yusong Wu , Xiang Xie , Zijin Li , Dong Yu

Sequence-to-sequence Singing Synthesis Using the Feed-forward Transformer

We propose a sequence-to-sequence singing synthesizer, which avoids the need for training data with pre-aligned phonetic and acoustic features. Rather than the more common approach of a content-based attention mechanism combined with an…

Sound · Computer Science 2020-02-21 Merlijn Blaauw , Jordi Bonada

Improving End-to-end Speech Translation by Leveraging Auxiliary Speech and Text Data

We present a method for introducing a text encoder into pre-trained end-to-end speech translation systems. It enhances the ability of adapting one modality (i.e., source-language speech) to another (i.e., source-language text). Thus, the…

Audio and Speech Processing · Electrical Eng. & Systems 2022-12-06 Yuhao Zhang , Chen Xu , Bojie Hu , Chunliang Zhang , Tong Xiao , Jingbo Zhu

Zero-shot Singing Technique Conversion

In this paper we propose modifications to the neural network framework, AutoVC for the task of singing technique conversion. This includes utilising a pretrained singing technique encoder which extracts technique information, upon which a…

Sound · Computer Science 2021-11-18 Brendan O'Connor , Simon Dixon , George Fazekas

Audio-Linguistic Embeddings for Spoken Sentences

We propose spoken sentence embeddings which capture both acoustic and linguistic content. While existing works operate at the character, phoneme, or word level, our method learns long-term dependencies by modeling speech at the sentence…

Sound · Computer Science 2019-02-22 Albert Haque , Michelle Guo , Prateek Verma , Li Fei-Fei

Disentangling Timbre and Singing Style with Multi-singer Singing Synthesis System

In this study, we define the identity of the singer with two independent concepts - timbre and singing style - and propose a multi-singer singing synthesis system that can model them separately. To this end, we extend our single-singer…

Sound · Computer Science 2019-10-30 Juheon Lee , Hyeong-Seok Choi , Junghyun Koo , Kyogu Lee

VAW-GAN for Singing Voice Conversion with Non-parallel Training Data

Singing voice conversion aims to convert singer's voice from source to target without changing singing content. Parallel training data is typically required for the training of singing voice conversion system, that is however not practical…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-04 Junchen Lu , Kun Zhou , Berrak Sisman , Haizhou Li

Transfer Learning from Speech Synthesis to Voice Conversion with Non-Parallel Training Data

This paper presents a novel framework to build a voice conversion (VC) system by learning from a text-to-speech (TTS) synthesis system, that is called TTS-VC transfer learning. We first develop a multi-speaker speech synthesis system with…

Audio and Speech Processing · Electrical Eng. & Systems 2021-01-07 Mingyang Zhang , Yi Zhou , Li Zhao , Haizhou Li

Unsupervised Singing Voice Conversion

We present a deep learning method for singing voice conversion. The proposed network is not conditioned on the text or on the notes, and it directly converts the audio of one singer to the voice of another. Training is performed without any…

Machine Learning · Computer Science 2019-09-26 Eliya Nachmani , Lior Wolf

Singing voice conversion with non-parallel data

Singing voice conversion is a task to convert a song sang by a source singer to the voice of a target singer. In this paper, we propose using a parallel data free, many-to-one voice conversion technique on singing voices. A phonetic…

Audio and Speech Processing · Electrical Eng. & Systems 2019-03-12 Xin Chen , Wei Chu , Jinxi Guo , Ning Xu