Related papers: Voice conversion using coefficient mapping and neu…

Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities

Voice conversion (VC) using sequence-to-sequence learning of context posterior probabilities is proposed. Conventional VC using shared context posterior probabilities predicts target speech parameters from the context posterior…

Sound · Computer Science 2017-08-08 Hiroyuki Miyoshi , Yuki Saito , Shinnosuke Takamichi , Hiroshi Saruwatari

On Using Backpropagation for Speech Texture Generation and Voice Conversion

Inspired by recent work on neural network image generation which rely on backpropagation towards the network inputs, we present a proof-of-concept system for speech texture synthesis and voice conversion based on two mechanisms: approximate…

Sound · Computer Science 2018-03-09 Jan Chorowski , Ron J. Weiss , Rif A. Saurous , Samy Bengio

LPCNet: Improving Neural Speech Synthesis Through Linear Prediction

Neural speech synthesis models have recently demonstrated the ability to synthesize high quality speech for text-to-speech and compression applications. These new models often require powerful GPUs to achieve real-time operation, so being…

Audio and Speech Processing · Electrical Eng. & Systems 2019-02-20 Jean-Marc Valin , Jan Skoglund

Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer

In this paper, we propose a model to perform style transfer of speech to singing voice. Contrary to the previous signal processing-based methods, which require high-quality singing templates or phoneme synchronization, we explore a…

Sound · Computer Science 2022-08-29 Shrutina Agarwal , Sriram Ganapathy , Naoya Takahashi

Voice Conversion for Stuttered Speech, Instruments, Unseen Languages and Textually Described Voices

Voice conversion aims to convert source speech into a target voice using recordings of the target speaker as a reference. Newer models are producing increasingly realistic output. But what happens when models are fed with non-standard data,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-10-13 Matthew Baas , Herman Kamper

Vowels and Prosody Contribution in Neural Network Based Voice Conversion Algorithm with Noisy Training Data

This research presents a neural network based voice conversion (VC) model. While it is a known fact that voiced sounds and prosody are the most important component of the voice conversion framework, what is not known is their objective…

Audio and Speech Processing · Electrical Eng. & Systems 2020-03-11 Olaide Agbolade

End-to-end LPCNet: A Neural Vocoder With Fully-Differentiable LPC Estimation

Neural vocoders have recently demonstrated high quality speech synthesis, but typically require a high computational complexity. LPCNet was proposed as a way to reduce the complexity of neural synthesis by using linear prediction (LP) to…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-31 Krishna Subramani , Jean-Marc Valin , Umut Isik , Paris Smaragdis , Arvindh Krishnaswamy

Fast-VGAN: Lightweight Voice Conversion with Explicit Control of F0 and Duration Parameters

Precise control over speech characteristics, such as pitch, duration, and speech rate, remains a significant challenge in the field of voice conversion. The ability to manipulate parameters like pitch and syllable rate is an important…

Sound · Computer Science 2025-07-08 Mathilde Abrassart , Nicolas Obin , Axel Roebel

Mathematical Vocoder Algorithm : Modified Spectral Inversion for Efficient Neural Speech Synthesis

In this work, we propose a new mathematical vocoder algorithm(modified spectral inversion) that generates a waveform from acoustic features without phase estimation. The main benefit of using our proposed method is that it excludes the…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-17 Hyun Gon Ryu , Jeong-Hoon Kim , Simon See

Optimizing voice conversion network with cycle consistency loss of speaker identity

We propose a novel training scheme to optimize voice conversion network with a speaker identity loss function. The training scheme not only minimizes frame-level spectral loss, but also speaker identity loss. We introduce a cycle…

Sound · Computer Science 2020-11-18 Hongqiang Du , Xiaohai Tian , Lei Xie , Haizhou Li

Speech Enhancement with Intelligent Neural Homomorphic Synthesis

Most neural network speech enhancement models ignore speech production mathematical models by directly mapping Fourier transform spectrums or waveforms. In this work, we propose a neural source filter network for speech enhancement.…

Sound · Computer Science 2022-10-31 Shulin He , Wei Rao , Jinjiang Liu , Jun Chen , Yukai Ju , Xueliang Zhang , Yannan Wang , Shidong Shang

Linear networks based speaker adaptation for speech synthesis

Speaker adaptation methods aim to create fair quality synthesis speech voice font for target speakers while only limited resources available. Recently, as deep neural networks based statistical parametric speech synthesis (SPSS) methods…

Audio and Speech Processing · Electrical Eng. & Systems 2018-03-08 Zhiying Huang , Heng Lu , Ming Lei , Zhijie Yan

Sequence-to-Sequence Acoustic Modeling for Voice Conversion

In this paper, a neural network named Sequence-to-sequence ConvErsion NeTwork (SCENT) is presented for acoustic modeling in voice conversion. At training stage, a SCENT model is estimated by aligning the feature sequences of source and…

Sound · Computer Science 2020-01-14 Jing-Xuan Zhang , Zhen-Hua Ling , Li-Juan Liu , Yuan Jiang , Li-Rong Dai

A two-stage full-band speech enhancement model with effective spectral compression mapping

The direct expansion of deep neural network (DNN) based wide-band speech enhancement (SE) to full-band processing faces the challenge of low frequency resolution in low frequency range, which would highly likely lead to deteriorated…

Sound · Computer Science 2022-06-28 Zhongshu Hou , Qinwen Hu , Kai Chen , Jing Lu

Advances in Speech Vocoding for Text-to-Speech with Continuous Parameters

Vocoders received renewed attention as main components in statistical parametric text-to-speech (TTS) synthesis and speech transformation systems. Even though there are vocoding techniques give almost accepted synthesized speech, their high…

Sound · Computer Science 2021-06-22 Mohammed Salah Al-Radhi , Tamás Gábor Csapó , Géza Németh

PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion

Voice conversion as the style transfer task applied to speech, refers to converting one person's speech into a new speech that sounds like another person's. Up to now, there has been a lot of research devoted to better implementation of VC…

Sound · Computer Science 2023-08-23 Yimin Deng , Huaizhen Tang , Xulong Zhang , Jianzong Wang , Ning Cheng , Jing Xiao

Voice Conversion with Conditional SampleRNN

Here we present a novel approach to conditioning the SampleRNN generative model for voice conversion (VC). Conventional methods for VC modify the perceived speaker identity by converting between source and target acoustic features. Our…

Sound · Computer Science 2018-10-30 Cong Zhou , Michael Horgan , Vivek Kumar , Cristina Vasco , Dan Darcy

Learning-based personal speech enhancement for teleconferencing by exploiting spatial-spectral features

Teleconferencing is becoming essential during the COVID-19 pandemic. However, in real-world applications, speech quality can deteriorate due to, for example, background interference, noise, or reverberation. To solve this problem, target…

Audio and Speech Processing · Electrical Eng. & Systems 2022-05-02 Yicheng Hsu , Yonghan Lee , Mingsian R. Bai

DisC-VC: Disentangled and F0-Controllable Neural Voice Conversion

Voice conversion is a task to convert a non-linguistic feature of a given utterance. Since naturalness of speech strongly depends on its pitch pattern, in some applications, it would be desirable to keep the original rise/fall pitch pattern…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-21 Chihiro Watanabe , Hirokazu Kameoka

Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network

In this paper, we propose an improved LPCNet vocoder using a linear prediction (LP)-structured mixture density network (MDN). The recently proposed LPCNet vocoder has successfully achieved high-quality and lightweight speech synthesis…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-03 Min-Jae Hwang , Eunwoo Song , Ryuichi Yamamoto , Frank Soong , Hong-Goo Kang