English
Related papers

Related papers: Diffusion-Based Voice Conversion with Fast Maximum…

200 papers

Speaker identity is one of the important characteristics of human speech. In voice conversion, we change the speaker identity from one to another, while keeping the linguistic content unchanged. Voice conversion involves multiple speech…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-18 Berrak Sisman , Junichi Yamagishi , Simon King , Haizhou Li

Recently, voice conversion (VC) without parallel data has been successfully adapted to multi-target scenario in which a single model is trained to convert the input voice to many different speakers. However, such model suffers from the…

Machine Learning · Computer Science 2019-08-23 Ju-chieh Chou , Cheng-chieh Yeh , Hung-yi Lee

Although diffusion models in text-to-speech have become a popular choice due to their strong generative ability, the intrinsic complexity of sampling from diffusion models harms their efficiency. Alternatively, we propose VoiceFlow, an…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-04 Yiwei Guo , Chenpeng Du , Ziyang Ma , Xie Chen , Kai Yu

Voice conversion is a method that allows for the transformation of speaking style while maintaining the integrity of linguistic information. There are many researchers using deep generative models for voice conversion tasks. Generative…

Sound · Computer Science 2023-08-29 Xulong Zhang , Jianzong Wang , Ning Cheng , Jing Xiao

Diffusion Models are probabilistic models that create realistic samples by simulating the diffusion process, gradually adding and removing noise from data. These models have gained popularity in domains such as image processing, speech…

Computer Vision and Pattern Recognition · Computer Science 2024-08-21 Md Manjurul Ahsan , Shivakumar Raman , Yingtao Liu , Zahed Siddique

The goal of voice conversion is to transform the speech of a source speaker to sound like that of a reference speaker while preserving the original content. A key challenge is to extract disentangled linguistic content from the source and…

Sound · Computer Science 2025-01-15 Jaehun Kim , Ji-Hoon Kim , Yeunju Choi , Tan Dat Nguyen , Seongkyu Mun , Joon Son Chung

We introduce DiffuseST, a low-latency, direct speech-to-speech translation system capable of preserving the input speaker's voice zero-shot while translating from multiple source languages into English. We experiment with the synthesizer…

Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs. While generative models have shown strong potential in speech synthesis, they are…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-11 Yen-Ju Lu , Zhong-Qiu Wang , Shinji Watanabe , Alexander Richard , Cheng Yu , Yu Tsao

Voice conversion has gained increasing popularity within the field of audio manipulation and speech synthesis. Often, the main objective is to transfer the input identity to that of a target speaker without changing its linguistic content.…

Sound · Computer Science 2024-08-30 Anders R. Bargum , Simon Lajboschitz , Cumhur Erkut

One-shot style transfer is a challenging task, since training on one utterance makes model extremely easy to over-fit to training data and causes low speaker similarity and lack of expressiveness. In this paper, we build on the…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-22 Zhichao Wang , Qicong Xie , Tao Li , Hongqiang Du , Lei Xie , Pengcheng Zhu , Mengxiao Bi

Generative AI has demonstrated impressive performance in various fields, among which speech synthesis is an interesting direction. With the diffusion model as the most popular generative model, numerous works have attempted two active…

Diffusion models have shown exceptional scaling properties in the image synthesis domain, and initial attempts have shown similar benefits for applying diffusion to unconditional text synthesis. Denoising diffusion models attempt to…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-17 Matthew Baas , Kevin Eloff , Herman Kamper

Advancements in artificial intelligence and machine learning have significantly improved synthetic speech generation. This paper explores diffusion models, a novel method for creating realistic synthetic speech. We create a diffusion…

Cryptography and Security · Computer Science 2025-01-15 Anton Firc , Kamil Malinka , Petr Hanáček

One-shot voice conversion (VC) aims to convert speech from any source speaker to an arbitrary target speaker with only a few seconds of reference speech from the target speaker. This relies heavily on disentangling the speaker's identity…

Audio and Speech Processing · Electrical Eng. & Systems 2023-01-02 Yinghao Aaron Li , Cong Han , Nima Mesgarani

Singing voice conversion (SVC) is one promising technique which can enrich the way of human-computer interaction by endowing a computer the ability to produce high-fidelity and expressive singing voice. In this paper, we propose DiffSVC, an…

Audio and Speech Processing · Electrical Eng. & Systems 2021-05-31 Songxiang Liu , Yuewen Cao , Dan Su , Helen Meng

Singing voice conversion is to convert the source singing voice into the target singing voice except for the content. Currently, flow-based models can complete the task of voice conversion, but they struggle to effectively extract latent…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-10 Hui Li , Hongyu Wang , Zhijin Chen , Bohan Sun , Bo Li

Applying changes to an input speech signal to change the perceived speaker of speech to a target while maintaining the content of the input is a challenging but interesting task known as Voice conversion (VC). Over the last few years, this…

Sound · Computer Science 2022-12-29 Olga Slizovskaia , Jordi Janer , Pritish Chandna , Oscar Mayor

Speech-to-speech translation is a typical sequence-to-sequence learning task that naturally has two directions. How to effectively leverage bidirectional supervision signals to produce high-fidelity audio for both directions? Existing…

Computation and Language · Computer Science 2023-05-23 Xianchao Wu

We propose a highly controllable voice manipulation system that can perform any-to-any voice conversion (VC) and prosody modulation simultaneously. State-of-the-art VC systems can transfer sentence-level characteristics such as speaker,…

Sound · Computer Science 2023-09-08 Kyungguen Byun , Sunkuk Moon , Erik Visser

In this work, we present DiffVoice, a novel text-to-speech model based on latent diffusion. We propose to first encode speech signals into a phoneme-rate latent representation with a variational autoencoder enhanced by adversarial training,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-04-25 Zhijun Liu , Yiwei Guo , Kai Yu
‹ Prev 1 2 3 10 Next ›