Related papers: Nonparallel Emotional Speech Conversion

EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel and In-the-wild Data

Speech emotion conversion is the task of converting the expressed emotion of a spoken utterance to a target emotion while preserving the lexical content and speaker identity. While most existing works in speech emotion conversion rely on…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-09 Navin Raj Prabhu , Bunlong Lay , Simon Welker , Nale Lehmann-Willenbrock , Timo Gerkmann

Transferring Source Style in Non-Parallel Voice Conversion

Voice conversion (VC) techniques aim to modify speaker identity of an utterance while preserving the underlying linguistic information. Most VC approaches ignore modeling of the speaking style (e.g. emotion and emphasis), which may contain…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-20 Songxiang Liu , Yuewen Cao , Shiyin Kang , Na Hu , Xunying Liu , Dan Su , Dong Yu , Helen Meng

Textless Speech Emotion Conversion using Discrete and Decomposed Representations

Speech emotion conversion is the task of modifying the perceived emotion of a speech utterance while preserving the lexical content and speaker identity. In this study, we cast the problem of emotion conversion as a spoken language…

Computation and Language · Computer Science 2022-12-14 Felix Kreuk , Adam Polyak , Jade Copet , Eugene Kharitonov , Tu-Anh Nguyen , Morgane Rivière , Wei-Ning Hsu , Abdelrahman Mohamed , Emmanuel Dupoux , Yossi Adi

Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain Pairing

Primary goal of an emotional voice conversion (EVC) system is to convert the emotion of a given speech signal from one style to another style without modifying the linguistic content of the signal. Most of the state-of-the-art approaches…

Sound · Computer Science 2023-02-22 Nirmesh Shah , Mayank Kumar Singh , Naoya Takahashi , Naoyuki Onoe

Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion

Emotional voice conversion aims to convert the emotion of speech from one state to another while preserving the linguistic content and speaker identity. The prior studies on emotional voice conversion are mostly carried out under the…

Sound · Computer Science 2020-10-14 Kun Zhou , Berrak Sisman , Mingyang Zhang , Haizhou Li

Improved Neural Text Attribute Transfer with Non-parallel Data

Text attribute transfer using non-parallel data requires methods that can perform disentanglement of content and linguistic attributes. In this work, we propose multiple improvements over the existing approaches that enable the…

Computation and Language · Computer Science 2017-12-06 Igor Melnyk , Cicero Nogueira dos Santos , Kahini Wadhawan , Inkit Padhi , Abhishek Kumar

Zero-Shot Emotion Transfer For Cross-Lingual Speech Synthesis

Zero-shot emotion transfer in cross-lingual speech synthesis aims to transfer emotion from an arbitrary speech reference in the source language to the synthetic speech in the target language. Building such a system faces challenges of…

Sound · Computer Science 2023-10-09 Yuke Li , Xinfa Zhu , Yi Lei , Hai Li , Junhui Liu , Danming Xie , Lei Xie

Learning in your voice: Non-parallel voice conversion based on speaker consistency loss

In this paper, we propose a novel voice conversion strategy to resolve the mismatch between the training and conversion scenarios when parallel speech corpus is unavailable for training. Based on auto-encoder and disentanglement frameworks,…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-05 Yoohwan Kwon , Soo-Whan Chung , Hee-Soo Heo , Hong-Goo Kang

Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

In expressive speech synthesis, there are high requirements for emotion interpretation. However, it is time-consuming to acquire emotional audio corpus for arbitrary speakers due to their deduction ability. In response to this problem, this…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-12 Pengfei Wu , Junjie Pan , Chenchang Xu , Junhui Zhang , Lin Wu , Xiang Yin , Zejun Ma

Improving Speech Emotion Recognition with Unsupervised Speaking Style Transfer

Humans can effortlessly modify various prosodic attributes, such as the placement of stress and the intensity of sentiment, to convey a specific emotion while maintaining consistent linguistic content. Motivated by this capability, we…

Sound · Computer Science 2023-12-29 Leyuan Qu , Wei Wang , Cornelius Weber , Pengcheng Yue , Taihao Li , Stefan Wermter

TranSentence: Speech-to-speech Translation via Language-agnostic Sentence-level Speech Encoding without Language-parallel Data

Although there has been significant advancement in the field of speech-to-speech translation, conventional models still require language-parallel speech data between the source and target languages for training. In this paper, we introduce…

Computation and Language · Computer Science 2024-03-21 Seung-Bin Kim , Sang-Hoon Lee , Seong-Whan Lee

Non-parallel Emotion Conversion using a Deep-Generative Hybrid Network and an Adversarial Pair Discriminator

We introduce a novel method for emotion conversion in speech that does not require parallel training data. Our approach loosely relies on a cycle-GAN schema to minimize the reconstruction error from converting back and forth between emotion…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-12 Ravi Shankar , Jacob Sager , Archana Venkataraman

ET-GAN: Cross-Language Emotion Transfer Based on Cycle-Consistent Generative Adversarial Networks

Despite the remarkable progress made in synthesizing emotional speech from text, it is still challenging to provide emotion information to existing speech segments. Previous methods mainly rely on parallel data, and few works have studied…

Sound · Computer Science 2020-03-06 Xiaoqi Jia , Jianwei Tai , Hang Zhou , Yakai Li , Weijuan Zhang , Haichao Du , Qingjia Huang

Non-Parallel Sequence-to-Sequence Voice Conversion with Disentangled Linguistic and Speaker Representations

This paper presents a method of sequence-to-sequence (seq2seq) voice conversion using non-parallel training data. In this method, disentangled linguistic and speaker representations are extracted from acoustic features, and voice conversion…

Audio and Speech Processing · Electrical Eng. & Systems 2020-01-14 Jing-Xuan Zhang , Zhen-Hua Ling , Li-Rong Dai

Sentiment Transfer using Seq2Seq Adversarial Autoencoders

Expressing in language is subjective. Everyone has a different style of reading and writing, apparently it all boil downs to the way their mind understands things (in a specific format). Language style transfer is a way to preserve the…

Computation and Language · Computer Science 2018-04-12 Ayush Singh , Ritu Palod

Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data

Emotional voice conversion aims to convert the spectrum and prosody to change the emotional patterns of speech, while preserving the speaker identity and linguistic content. Many studies require parallel speech data between different…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-27 Kun Zhou , Berrak Sisman , Haizhou Li

Cross-speaker Emotion Transfer by Manipulating Speech Style Latents

In recent years, emotional text-to-speech has shown considerable progress. However, it requires a large amount of labeled data, which is not easily accessible. Even if it is possible to acquire an emotional speech dataset, there is still a…

Sound · Computer Science 2023-03-16 Suhee Jo , Younggun Lee , Yookyung Shin , Yeongtae Hwang , Taesu Kim

Style Transfer from Non-Parallel Text by Cross-Alignment

This paper focuses on style transfer on the basis of non-parallel text. This is an instance of a broad family of problems including machine translation, decipherment, and sentiment modification. The key challenge is to separate the content…

Computation and Language · Computer Science 2017-11-07 Tianxiao Shen , Tao Lei , Regina Barzilay , Tommi Jaakkola

Learning Multilingual Expressive Speech Representation for Prosody Prediction without Parallel Data

We propose a method for speech-to-speech emotionpreserving translation that operates at the level of discrete speech units. Our approach relies on the use of multilingual emotion embedding that can capture affective information in a…

Audio and Speech Processing · Electrical Eng. & Systems 2023-07-03 Jarod Duret , Titouan Parcollet , Yannick Estève

In-the-wild Speech Emotion Conversion Using Disentangled Self-Supervised Representations and Neural Vocoder-based Resynthesis

Speech emotion conversion aims to convert the expressed emotion of a spoken utterance to a target emotion while preserving the lexical information and the speaker's identity. In this work, we specifically focus on in-the-wild emotion…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-06 Navin Raj Prabhu , Nale Lehmann-Willenbrock , Timo Gerkmann