English
Related papers

Related papers: SelfVC: Voice Conversion With Iterative Refinement…

200 papers

This paper presents a method of sequence-to-sequence (seq2seq) voice conversion using non-parallel training data. In this method, disentangled linguistic and speaker representations are extracted from acoustic features, and voice conversion…

Audio and Speech Processing · Electrical Eng. & Systems 2020-01-14 Jing-Xuan Zhang , Zhen-Hua Ling , Li-Rong Dai

Voice Conversion (VC) aims to modify a speaker's timbre while preserving linguistic content. While recent VC models achieve strong performance, most struggle in real-time streaming scenarios due to high latency, dependence on ASR modules,…

Sound · Computer Science 2025-10-13 Zhao Guo , Ziqian Ning , Guobin Ma , Lei Xie

Modern speech enhancement (SE) networks typically implement noise suppression through time-frequency masking, latent representation masking, or discriminative signal prediction. In contrast, some recent works explore SE via generative…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-07 Bryce Irvin , Marko Stamenovic , Mikolaj Kegler , Li-Chia Yang

Voice conversion (VC) consists of digitally altering the voice of an individual to manipulate part of its content, primarily its identity, while maintaining the rest unchanged. Research in neural VC has accomplished considerable…

Sound · Computer Science 2021-07-28 Laurent Benaroya , Nicolas Obin , Axel Roebel

Voice conversion (VC) using sequence-to-sequence learning of context posterior probabilities is proposed. Conventional VC using shared context posterior probabilities predicts target speech parameters from the context posterior…

Sound · Computer Science 2017-08-08 Hiroyuki Miyoshi , Yuki Saito , Shinnosuke Takamichi , Hiroshi Saruwatari

We propose using self-supervised discrete representations for the task of speech resynthesis. To generate disentangled representation, we separately extract low-bitrate representations for speech content, prosodic information, and speaker…

Voice conversion (VC) can be achieved by first extracting source content information and target speaker information, and then reconstructing waveform with these information. However, current approaches normally either extract dirty content…

Sound · Computer Science 2022-10-28 Jingyi li , Weiping tu , Li xiao

Style voice conversion aims to transform the speaking style of source speech into a desired style while keeping the original speaker's identity. However, previous style voice conversion approaches primarily focus on well-defined domains…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-09 Xinfa Zhu , Lei He , Yujia Xiao , Xi Wang , Xu Tan , Sheng Zhao , Lei Xie

Voice conversion is the task to transform voice characteristics of source speech while preserving content information. Nowadays, self-supervised representation learning models are increasingly utilized in content extraction. However, in…

Sound · Computer Science 2024-05-02 Yimin Deng , Jianzong Wang , Xulong Zhang , Ning Cheng , Jing Xiao

One-shot voice conversion (VC) aims to convert speech from any source speaker to an arbitrary target speaker with only a few seconds of reference speech from the target speaker. This relies heavily on disentangling the speaker's identity…

Audio and Speech Processing · Electrical Eng. & Systems 2023-01-02 Yinghao Aaron Li , Cong Han , Nima Mesgarani

Noise suppression (NS) algorithms are effective in improving speech quality in many cases. However, aggressive noise suppression can damage the target speech, reducing both speech intelligibility and quality despite removing the noise. This…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-11 Kyungguen Byun , Jason Filos , Erik Visser , Sunkuk Moon

We introduce LinearVC, a simple voice conversion method that sheds light on the structure of self-supervised representations. First, we show that simple linear transformations of self-supervised features effectively convert voices. Next, we…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-03 Herman Kamper , Benjamin van Niekerk , Julian Zaïdi , Marc-André Carbonneau

Zero-shot voice conversion aims to transfer the voice of a source speaker to that of a speaker unseen during training, while preserving the content information. Although various methods have been proposed to reconstruct speaker information…

Sound · Computer Science 2024-08-22 Anastasia Avdeeva , Aleksei Gusev

Melody preservation is crucial in singing voice conversion (SVC). However, in many scenarios, audio is often accompanied with background music (BGM), which can cause audio distortion and interfere with the extraction of melody and other key…

Sound · Computer Science 2025-02-10 Wei Chen , Binzhu Sha , Jing Yang , Zhuo Wang , Fan Fan , Zhiyong Wu

In this paper, we propose an invertible deep learning framework called INVVC for voice conversion. It is designed against the possible threats that inherently come along with voice conversion systems. Specifically, we develop an invertible…

Audio and Speech Processing · Electrical Eng. & Systems 2022-01-27 Zexin Cai , Ming Li

In this work, we introduce a framework for cross-lingual speech synthesis, which involves an upstream Voice Conversion (VC) model and a downstream Text-To-Speech (TTS) model. The proposed framework consists of 4 stages. In the first two…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-18 Dariusz Piotrowski , Renard Korzeniowski , Alessio Falai , Sebastian Cygert , Kamil Pokora , Georgi Tinchev , Ziyao Zhang , Kayoko Yanagisawa

Any-to-any voice conversion (VC) aims to convert the timbre of utterances from and to any speakers seen or unseen during training. Various any-to-any VC approaches have been proposed like AUTOVC, AdaINVC, and FragmentVC. AUTOVC, and AdaINVC…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-15 Jheng-hao Lin , Yist Y. Lin , Chung-Ming Chien , Hung-yi Lee

We propose a flexible framework that deals with both singer conversion and singers vocal technique conversion. The proposed model is trained on non-parallel corpora, accommodates many-to-many conversion, and leverages recent advances of…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-26 Yin-Jyun Luo , Chin-Chen Hsu , Kat Agres , Dorien Herremans

Voice conversion is a task to convert a non-linguistic feature of a given utterance. Since naturalness of speech strongly depends on its pitch pattern, in some applications, it would be desirable to keep the original rise/fall pitch pattern…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-21 Chihiro Watanabe , Hirokazu Kameoka

Zero-shot voice conversion (VC) aims to transform source speech into arbitrary unseen target voice while keeping the linguistic content unchanged. Recent VC methods have made significant progress, but semantic losses in the decoupling…

Sound · Computer Science 2024-06-17 Linhan Ma , Xinfa Zhu , Yuanjun Lv , Zhichao Wang , Ziqian Wang , Wendi He , Hongbin Zhou , Lei Xie