English
Related papers

Related papers: SelfVC: Voice Conversion With Iterative Refinement…

200 papers

We present an approach for unsupervised learning of speech representation disentangling contents and styles. Our model consists of: (1) a local encoder that captures per-frame information; (2) a global encoder that captures per-utterance…

Computation and Language · Computer Science 2021-06-22 Andros Tjandra , Ruoming Pang , Yu Zhang , Shigeki Karita

This paper introduces voice reenactement as the task of voice conversion (VC) in which the expressivity of the source speaker is preserved during conversion while the identity of a target speaker is transferred. To do so, an original…

Sound · Computer Science 2022-06-01 Frederik Bous , Laurent Benaroya , Nicolas Obin , Axel Roebel

Sequence-to-sequence (seq2seq) voice conversion (VC) models are attractive owing to their ability to convert prosody. Nonetheless, without sufficient data, seq2seq VC models can suffer from unstable training and mispronunciation problems in…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-10 Wen-Chin Huang , Tomoki Hayashi , Yi-Chiao Wu , Hirokazu Kameoka , Tomoki Toda

Modern speaker recognition system relies on abundant and balanced datasets for classification training. However, diverse defective datasets, such as partially-labelled, small-scale, and imbalanced datasets, are common in real-world…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-03 Ruijie Tao , Zhan Shi , Yidi Jiang , Tianchi Liu , Haizhou Li

Singing voice conversion (SVC) aims to convert the voice of one singer to that of other singers while keeping the singing content and melody. On top of recent voice conversion works, we propose a novel model to steadily convert songs while…

Sound · Computer Science 2020-10-29 Zhonghao Li , Benlai Tang , Xiang Yin , Yuan Wan , Ling Xu , Chen Shen , Zejun Ma

This paper introduces FastVC, an end-to-end model for fast Voice Conversion (VC). The proposed model can convert speech of arbitrary length from multiple source speakers to multiple target speakers. FastVC is based on a conditional…

Audio and Speech Processing · Electrical Eng. & Systems 2021-05-07 Oriol Barbany Mayor , Milos Cernak

The conversion from text to speech relies on the accurate mapping from linguistic to acoustic symbol sequences, for which current practice employs recurrent statistical models like recurrent neural networks. Despite the good performance of…

Sound · Computer Science 2018-11-07 Santiago Pascual , Antonio Bonafonte , Joan Serrà

Self-supervised visual pretraining has shown significant progress recently. Among those methods, SimCLR greatly advanced the state of the art in self-supervised and semi-supervised learning on ImageNet. The input feature representations for…

Computation and Language · Computer Science 2021-07-06 Dongwei Jiang , Wubo Li , Miao Cao , Wei Zou , Xiangang Li

Speech enhancement has recently achieved great success with various deep learning methods. However, most conventional speech enhancement systems are trained with supervised methods that impose two significant challenges. First, a majority…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-22 Viet Anh Trinh , Sebastian Braun

Beyond the conventional voice conversion (VC) where the speaker information is converted without altering the linguistic content, the background sounds are informative and need to be retained in some real-world scenarios, such as VC in…

Sound · Computer Science 2021-11-16 Chao Xie , Yi-Chiao Wu , Patrick Lumban Tobing , Wen-Chin Huang , Tomoki Toda

Recently proposed self-supervised learning approaches have been successful for pre-training speech representation models. The utility of these learned representations has been observed empirically, but not much has been studied about the…

Computation and Language · Computer Science 2022-12-06 Ankita Pasad , Ju-Chieh Chou , Karen Livescu

Singing Voice Conversion (SVC) aims to transform a source singing voice into a target singer while preserving lyrics and melody. Most existing SVC methods depend on F0 extractors to capture the lead melody from clean vocals. However, no…

Sound · Computer Science 2026-05-13 Chen Geng , Meng Chen , Ruohua Zhou , Ruolan Liu , Weifeng Zhao

This work presents self-supervised learning methods for developing monaural speaker-specific (i.e., personalized) speech enhancement models. While generalist models must broadly address many speakers, specialist models can adapt their…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-28 Aswin Sivaraman , Minje Kim

A speaker verification (SV) system offers an authentication service designed to confirm whether a given speech sample originates from a specific speaker. This technology has paved the way for various personalized applications that cater to…

Cross-lingual voice conversion (VC) is a task that aims to synthesize target voices with the same content while source and target speakers speak in different languages. Its challenge lies in the fact that the source and target data are…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-01 Che-Jui Chang

Style voice conversion aims to transform the style of source speech to a desired style according to real-world application demands. However, the current style voice conversion approach relies on pre-defined labels or reference speech to…

Audio and Speech Processing · Electrical Eng. & Systems 2023-12-27 Jixun Yao , Yuguang Yang , Yi Lei , Ziqian Ning , Yanni Hu , Yu Pan , Jingjing Yin , Hongbin Zhou , Heng Lu , Lei Xie

Expressive voice conversion aims to transfer both speaker identity and expressive attributes from a target speech to a given source speech. In this work, we improve over a self-supervised, non-autoregressive framework with a conditional…

Sound · Computer Science 2025-06-05 Seymanur Akti , Tuan Nam Nguyen , Alexander Waibel

Voice conversion (VC) is a task to transform a person's voice to different style while conserving linguistic contents. Previous state-of-the-art on VC is based on sequence-to-sequence (seq2seq) model, which could mislead linguistic…

Audio and Speech Processing · Electrical Eng. & Systems 2019-11-28 Tae-Ho Kim , Sungjae Cho , Shinkook Choi , Sejik Park , Soo-Young Lee

Zero-shot voice conversion is becoming an increasingly popular research topic, as it promises the ability to transform speech to sound like any speaker. However, relatively little work has been done on end-to-end methods for this task,…

Audio and Speech Processing · Electrical Eng. & Systems 2024-04-04 Wonjune Kang , Mark Hasegawa-Johnson , Deb Roy

We propose a new paradigm for maintaining speaker identity in dysarthric voice conversion (DVC). The poor quality of dysarthric speech can be greatly improved by statistical VC, but as the normal speech utterances of a dysarthria patient…

‹ Prev 1 8 9 10 Next ›