Related papers: Multi-speaker Emotion Conversion via Latent Variab…

Multi-Modal Emotion Detection with Transfer Learning

Automated emotion detection in speech is a challenging task due to the complex interdependence between words and the manner in which they are spoken. It is made more difficult by the available datasets; their small size and incompatible…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-16 Amith Ananthram , Kailash Karthik Saravanakumar , Jessica Huynh , Homayoon Beigi

Textless Speech Emotion Conversion using Discrete and Decomposed Representations

Speech emotion conversion is the task of modifying the perceived emotion of a speech utterance while preserving the lexical content and speaker identity. In this study, we cast the problem of emotion conversion as a spoken language…

Computation and Language · Computer Science 2022-12-14 Felix Kreuk , Adam Polyak , Jade Copet , Eugene Kharitonov , Tu-Anh Nguyen , Morgane Rivière , Wei-Ning Hsu , Abdelrahman Mohamed , Emmanuel Dupoux , Yossi Adi

Emotion-Aware Transformer Encoder for Empathetic Dialogue Generation

Modern day conversational agents are trained to emulate the manner in which humans communicate. To emotionally bond with the user, these virtual agents need to be aware of the affective state of the user. Transformers are the recent state…

Sound · Computer Science 2022-04-26 Raman Goel , Seba Susan , Sachin Vashisht , Armaan Dhanda

EmoFormer: A Text-Independent Speech Emotion Recognition using a Hybrid Transformer-CNN model

Speech Emotion Recognition is a crucial area of research in human-computer interaction. While significant work has been done in this field, many state-of-the-art networks struggle to accurately recognize emotions in speech when the data is…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-23 Rashedul Hasan , Meher Nigar , Nursadul Mamun , Sayan Paul

Affective Neural Response Generation

Existing neural conversational models process natural language primarily on a lexico-syntactic level, thereby ignoring one of the most crucial components of human-to-human dialogue: its affective content. We take a step in this direction by…

Computation and Language · Computer Science 2017-09-14 Nabiha Asghar , Pascal Poupart , Jesse Hoey , Xin Jiang , Lili Mou

Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion

Emotional voice conversion aims to convert the emotion of speech from one state to another while preserving the linguistic content and speaker identity. The prior studies on emotional voice conversion are mostly carried out under the…

Sound · Computer Science 2020-10-14 Kun Zhou , Berrak Sisman , Mingyang Zhang , Haizhou Li

Adapting General Disentanglement-Based Speaker Anonymization for Enhanced Emotion Preservation

A general disentanglement-based speaker anonymization system typically separates speech into content, speaker, and prosody features using individual encoders. This paper explores how to adapt such a system when a new speech attribute, for…

Sound · Computer Science 2025-04-24 Xiaoxiao Miao , Yuxiang Zhang , Xin Wang , Natalia Tomashenko , Donny Cheng Lock Soh , Ian Mcloughlin

SEDTalker: Emotion-Aware 3D Facial Animation Using Frame-Level Speech Emotion Diarization

We introduce SEDTalker, an emotion-aware framework for speech-driven 3D facial animation that leverages frame-level speech emotion diarization to achieve fine-grained expressive control. Unlike prior approaches that rely on utterance-level…

Computer Vision and Pattern Recognition · Computer Science 2026-04-16 Farzaneh Jafari , Stefano Berretti , Anup Basu

EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion

The Emotional Voice Conversion (EVC) aims to convert the discrete emotional state from the source emotion to the target for a given speech utterance while preserving linguistic content. In this paper, we propose regularizing emotion…

Audio and Speech Processing · Electrical Eng. & Systems 2024-12-31 Ashishkumar Gudmalwar , Ishan D. Biyani , Nirmesh Shah , Pankaj Wasnik , Rajiv Ratn Shah

Converting Anyone's Voice: End-to-End Expressive Voice Conversion with a Conditional Diffusion Model

Expressive voice conversion (VC) conducts speaker identity conversion for emotional speakers by jointly converting speaker identity and emotional style. Emotional style modeling for arbitrary speakers in expressive VC has not been…

Audio and Speech Processing · Electrical Eng. & Systems 2024-05-06 Zongyang Du , Junchen Lu , Kun Zhou , Lakshmish Kaushik , Berrak Sisman

Multi-Target Emotional Voice Conversion With Neural Vocoders

Emotional voice conversion (EVC) is one way to generate expressive synthetic speech. Previous approaches mainly focused on modeling one-to-one mapping, i.e., conversion from one emotional state to another emotional state, with Mel-cepstral…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-09 Songxiang Liu , Yuewen Cao , Helen Meng

Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition

The mainstream paradigm of speech emotion recognition (SER) is identifying the single emotion label of the entire utterance. This line of works neglect the emotion dynamics at fine temporal granularity and mostly fail to leverage linguistic…

Sound · Computer Science 2024-03-29 Siyuan Shen , Yu Gao , Feng Liu , Hanyang Wang , Aimin Zhou

In-the-wild Speech Emotion Conversion Using Disentangled Self-Supervised Representations and Neural Vocoder-based Resynthesis

Speech emotion conversion aims to convert the expressed emotion of a spoken utterance to a target emotion while preserving the lexical information and the speaker's identity. In this work, we specifically focus on in-the-wild emotion…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-06 Navin Raj Prabhu , Nale Lehmann-Willenbrock , Timo Gerkmann

Fast-VGAN: Lightweight Voice Conversion with Explicit Control of F0 and Duration Parameters

Precise control over speech characteristics, such as pitch, duration, and speech rate, remains a significant challenge in the field of voice conversion. The ability to manipulate parameters like pitch and syllable rate is an important…

Sound · Computer Science 2025-07-08 Mathilde Abrassart , Nicolas Obin , Axel Roebel

Learning Discriminative features using Center Loss and Reconstruction as Regularizer for Speech Emotion Recognition

This paper proposes a Convolutional Neural Network (CNN) inspired by Multitask Learning (MTL) and based on speech features trained under the joint supervision of softmax loss and center loss, a powerful metric learning strategy, for the…

Sound · Computer Science 2019-09-04 Suraj Tripathi , Abhiram Ramesh , Abhay Kumar , Chirag Singh , Promod Yenigalla

Hierarchical Transformer Network for Utterance-level Emotion Recognition

While there have been significant advances in de-tecting emotions in text, in the field of utter-ance-level emotion recognition (ULER), there are still many problems to be solved. In this paper, we address some challenges in ULER in dialog…

Computation and Language · Computer Science 2020-02-19 QingBiao Li , ChunHua Wu , KangFeng Zheng , Zhe Wang

Emotion Invariant Speaker Embeddings for Speaker Identification with Emotional Speech

Emotional state of a speaker is found to have significant effect in speech production, which can deviate speech from that arising from neutral state. This makes identifying speakers with different emotions a challenging task as generally…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-09 Biswajit Dev Sarma , Rohan Kumar Das

Emotion is Not Just a Label: Latent Emotional Factors in LLM Processing

Large language models are routinely deployed on text that varies widely in emotional tone, yet their reasoning behavior is typically evaluated without accounting for emotion as a source of representational variation. Prior work has largely…

Computation and Language · Computer Science 2026-03-17 Benjamin Reichman , Adar Avsian , Samuel Webster , Larry Heck

Emotion Intensity and its Control for Emotional Voice Conversion

Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while preserving the linguistic content and speaker identity. In EVC, emotions are usually treated as discrete categories overlooking the fact that speech…

Sound · Computer Science 2022-07-19 Kun Zhou , Berrak Sisman , Rajib Rana , Björn W. Schuller , Haizhou Li

A Diffeomorphic Flow-based Variational Framework for Multi-speaker Emotion Conversion

This paper introduces a new framework for non-parallel emotion conversion in speech. Our framework is based on two key contributions. First, we propose a stochastic version of the popular CycleGAN model. Our modified loss function…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-10 Ravi Shankar , Hsi-Wei Hsieh , Nicolas Charon , Archana Venkataraman