English
Related papers

Related papers: SelfVC: Voice Conversion With Iterative Refinement…

200 papers

Zero-shot voice conversion (VC) aims to transfer the timbre from the source speaker to an arbitrary unseen speaker while preserving the original linguistic content. Despite recent advancements in zero-shot VC using language model-based or…

Audio and Speech Processing · Electrical Eng. & Systems 2024-12-11 Jixun Yao , Yuguang Yang , Yu Pan , Ziqian Ning , Jiaohao Ye , Hongbin Zhou , Lei Xie

In this paper, we propose a model to perform style transfer of speech to singing voice. Contrary to the previous signal processing-based methods, which require high-quality singing templates or phoneme synchronization, we explore a…

Sound · Computer Science 2022-08-29 Shrutina Agarwal , Sriram Ganapathy , Naoya Takahashi

In a conventional voice conversion (VC) framework, a VC model is often trained with a clean dataset consisting of speech data carefully recorded and selected by minimizing background interference. However, collecting such a high-quality…

Sound · Computer Science 2021-09-23 Chao Xie , Yi-Chiao Wu , Patrick Lumban Tobing , Wen-Chin Huang , Tomoki Toda

In this paper, we propose an effective training strategy to ex-tract robust speaker representations from a speech signal. Oneof the key challenges in speaker recognition tasks is to learnlatent representations or embeddings containing…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-05 Yoohwan Kwon , Soo-Whan Chung , Hong-Goo Kang

Speech-to-singing voice conversion (STS) task always suffers from data scarcity, because it requires paired speech and singing data. Compounding this issue are the challenges of content-pitch alignment and the suboptimal quality of…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-05 Ruiqi Li , Rongjie Huang , Yongqi Wang , Zhiqing Hong , Zhou Zhao

Voice Conversion (VC) emerged as a significant domain of research in the field of speech synthesis in recent years due to its emerging application in voice-assisting technology, automated movie dubbing, and speech-to-singing conversion to…

Sound · Computer Science 2021-04-27 Sandipan Dhar , Nanda Dulal Jana , Swagatam Das

High-fidelity speech can be synthesized by end-to-end text-to-speech models in recent years. However, accessing and controlling speech attributes such as speaker identity, prosody, and emotion in a text-to-speech system remains a challenge.…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-05 Zexin Cai , Chuxiong Zhang , Ming Li

The goal of cross-speaker style transfer in TTS is to transfer a speech style from a source speaker with expressive data to a target speaker with only neutral data. In this context, we propose using a pre-trained singing voice conversion…

Audio and Speech Processing · Electrical Eng. & Systems 2024-10-10 Leonardo B. de M. M. Marques , Lucas H. Ueda , Mário U. Neto , Flávio O. Simões , Fernando Runstein , Bianca Dal Bó , Paula D. P. Costa

This paper focuses on using voice conversion (VC) to improve the speech intelligibility of surgical patients who have had parts of their articulators removed. Due to the difficulty of data collection, VC without parallel data is highly…

Audio and Speech Processing · Electrical Eng. & Systems 2019-08-26 Li-Wei Chen , Hung-Yi Lee , Yu Tsao

We propose an unsupervised learning method to disentangle speech into content representation and speaker identity representation. We apply this method to the challenging one-shot cross-lingual voice conversion task to demonstrate the…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-26 Hui Lu , Disong Wang , Xixin Wu , Zhiyong Wu , Xunying Liu , Helen Meng

Speech signals contain a lot of sensitive information, such as the speaker's identity, which raises privacy concerns when speech data get collected. Speaker anonymization aims to transform a speech signal to remove the source speaker's…

Sound · Computer Science 2023-01-16 Pierre Champion , Denis Jouvet , Anthony Larcher

This paper presents a novel task, zero-shot voice conversion based on face images (zero-shot FaceVC), which aims at converting the voice characteristics of an utterance from any source speaker to a newly coming target speaker, solely…

Sound · Computer Science 2023-09-19 Zheng-Yan Sheng , Yang Ai , Yan-Nian Chen , Zhen-Hua Ling

This paper proposes a voice conversion (VC) method using sequence-to-sequence (seq2seq or S2S) learning, which flexibly converts not only the voice characteristics but also the pitch contour and duration of input speech. The proposed…

Sound · Computer Science 2020-10-08 Hirokazu Kameoka , Kou Tanaka , Damian Kwasny , Takuhiro Kaneko , Nobukatsu Hojo

Human speech can be characterized by different components, including semantic content, speaker identity and prosodic information. Significant progress has been made in disentangling representations for semantic content and speaker identity…

Sound · Computer Science 2023-09-27 Leyuan Qu , Taihao Li , Cornelius Weber , Theresa Pekarek-Rosin , Fuji Ren , Stefan Wermter

We present a neural analysis and synthesis (NANSY) framework that can manipulate voice, pitch, and speed of an arbitrary speech signal. Most of the previous works have focused on using information bottleneck to disentangle analysis features…

Sound · Computer Science 2021-10-29 Hyeong-Seok Choi , Juheon Lee , Wansoo Kim , Jie Hwan Lee , Hoon Heo , Kyogu Lee

While expressive speech synthesis or voice conversion systems mainly focus on controlling or manipulating abstract prosodic characteristics of speech, such as emotion or accent, we here address the control of perceptual voice qualities…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-16 Frederik Rautenberg , Michael Kuhlmann , Fritz Seebauer , Jana Wiechmann , Petra Wagner , Reinhold Haeb-Umbach

Non-parallel voice conversion (VC) is typically achieved using lossy representations of the source speech. However, ensuring only speaker identity information is dropped whilst all other information from the source speech is retained is a…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-16 Thomas Merritt , Abdelhamid Ezzerg , Piotr Biliński , Magdalena Proszewska , Kamil Pokora , Roberto Barra-Chicote , Daniel Korzekwa

Self-supervised learning (SSL) has shown significant progress in speech processing tasks. However, despite the intrinsic randomness in the Transformer structure, such as dropout variants and layer-drop, improving the model-level consistency…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-16 Ji Won Yoon , Seok Min Kim , Nam Soo Kim

Voice conversion (VC) could be used to improve speech recognition systems in low-resource languages by using it to augment limited training data. However, VC has not been widely used for this purpose because of practical issues such as…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-22 Matthew Baas , Herman Kamper

Though significant progress has been made for the voice conversion (VC) of typical speech, VC for atypical speech, e.g., dysarthric and second-language (L2) speech, remains a challenge, since it involves correcting for atypical prosody…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-26 Disong Wang , Songxiang Liu , Lifa Sun , Xixin Wu , Xunying Liu , Helen Meng