English
Related papers

Related papers: SelfVC: Voice Conversion With Iterative Refinement…

200 papers

Self-supervised learning (SSL) based speech pre-training has attracted much attention for its capability of extracting rich representations learned from massive unlabeled data. On the other hand, the use of weakly-supervised data is less…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-30 Wangyou Zhang , Yanmin Qian

In real-world singing voice conversion (SVC) applications, environmental noise and the demand for expressive output pose significant challenges. Conventional methods, however, are typically designed without accounting for real deployment…

Sound · Computer Science 2025-10-24 Junjie Zheng , Gongyu Chen , Chaofan Ding , Zihao Chen

Streaming voice conversion has become increasingly popular for its potential in real-time applications. The recently proposed DualVC 2 has achieved robust and high-quality streaming voice conversion with a latency of about 180ms.…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-13 Ziqian Ning , Shuai Wang , Pengcheng Zhu , Zhichao Wang , Jixun Yao , Lei Xie , Mengxiao Bi

Supervised speech enhancement relies on parallel databases of degraded speech signals and their clean reference signals during training. This setting prohibits the use of real-world degraded speech data that may better represent the…

Audio and Speech Processing · Electrical Eng. & Systems 2021-09-22 Yangyang Xia , Buye Xu , Anurag Kumar

Creating realistic and natural-sounding synthetic speech remains a big challenge for voice identities unseen during training. As there is growing interest in synthesizing voices of new speakers, here we investigate the ability of…

Singing Voice Conversion (SVC) transfers a source singer's timbre to a target while keeping melody and lyrics. The key challenge in any-to-any SVC is adapting unseen speaker timbres to source audio without quality degradation. Existing…

Sound · Computer Science 2025-08-11 Wei Chen , Binzhu Sha , Dan Luo , Jing Yang , Zhuo Wang , Fan Fan , Zhiyong Wu

Generative audio technologies now enable highly realistic voice cloning and real-time voice conversion, increasing the risk of impersonation, fraud, and misinformation in communication channels such as phone and video calls. This study…

Sound · Computer Science 2026-01-09 Prajwal Chinchmalatpure , Suyash Chinchmalatpure , Siddharth Chavan

Emotional voice conversion (EVC) aims to change the emotional state of an utterance while preserving the linguistic content and speaker identity. In this paper, we propose a novel 2-stage training strategy for sequence-to-sequence emotional…

Computation and Language · Computer Science 2021-06-10 Kun Zhou , Berrak Sisman , Haizhou Li

In real-world voice conversion applications, environmental noise in source speech and user demands for expressive output pose critical challenges. Traditional ASR-based methods ensure noise robustness but suppress prosody richness, while…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-11 Yuepeng Jiang , Ziqian Ning , Shuai Wang , Chengjia Wang , Mengxiao Bi , Pengcheng Zhu , Zhonghua Fu , Lei Xie

We present a method for transferring pre-trained self-supervised (SSL) speech representations to multiple languages. There is an abundance of unannotated speech, so creating self-supervised representations from raw audio and fine-tuning on…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-08 Samuel Kessler , Bethan Thomas , Salah Karout

We introduce a novel sequence-to-sequence (seq2seq) voice conversion (VC) model based on the Transformer architecture with text-to-speech (TTS) pretraining. Seq2seq VC models are attractive owing to their ability to convert prosody. While…

Audio and Speech Processing · Electrical Eng. & Systems 2019-12-17 Wen-Chin Huang , Tomoki Hayashi , Yi-Chiao Wu , Hirokazu Kameoka , Tomoki Toda

Factorizing speech as disentangled speech representations is vital to achieve highly controllable style transfer in voice conversion (VC). Conventional speech representation learning methods in VC only factorize speech as speaker and…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-06 Jie Wang , Jingbei Li , Xintao Zhao , Zhiyong Wu , Shiyin Kang , Helen Meng

Voice Conversion research in recent times has increasingly focused on improving the zero-shot capabilities of existing methods. Despite remarkable advancements, current architectures still tend to struggle in zero-shot cross-lingual…

Sound · Computer Science 2025-05-26 Advait Joglekar , Divyanshu Singh , Rooshil Rohit Bhatia , S. Umesh

This paper presents an adversarial learning method for recognition-synthesis based non-parallel voice conversion. A recognizer is used to transform acoustic features into linguistic representations while a synthesizer recovers output…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-07 Jing-Xuan Zhang , Zhen-Hua Ling , Li-Rong Dai

We present a deep learning method for singing voice conversion. The proposed network is not conditioned on the text or on the notes, and it directly converts the audio of one singer to the voice of another. Training is performed without any…

Machine Learning · Computer Science 2019-09-26 Eliya Nachmani , Lior Wolf

Singing voice synthesis (SVS) has seen remarkable advancements in recent years. However, compared to speech and general audio data, publicly available singing datasets remain limited. In practice, this data scarcity often leads to…

Sound · Computer Science 2025-12-17 Yiwen Zhao , Jiatong Shi , Yuxun Tang , William Chen , Shinji Watanabe

Speech emotion conversion aims to convert the expressed emotion of a spoken utterance to a target emotion while preserving the lexical information and the speaker's identity. In this work, we specifically focus on in-the-wild emotion…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-06 Navin Raj Prabhu , Nale Lehmann-Willenbrock , Timo Gerkmann

While many recent any-to-any voice conversion models succeed in transferring some target speech's style information to the converted speech, they still lack the ability to faithfully reproduce the speaking style of the target speaker. In…

Audio and Speech Processing · Electrical Eng. & Systems 2023-12-18 Hyungseob Lim , Kyungguen Byun , Sunkuk Moon , Erik Visser

Voice conversion is a challenging task which transforms the voice characteristics of a source speaker to a target speaker without changing linguistic content. Recently, there have been many works on many-to-many Voice Conversion (VC) based…

Audio and Speech Processing · Electrical Eng. & Systems 2021-09-23 Manh Luong , Viet Anh Tran

Singing voice conversion (SVC) is one promising technique which can enrich the way of human-computer interaction by endowing a computer the ability to produce high-fidelity and expressive singing voice. In this paper, we propose DiffSVC, an…

Audio and Speech Processing · Electrical Eng. & Systems 2021-05-31 Songxiang Liu , Yuewen Cao , Dan Su , Helen Meng