Related papers: Improved Speech Emotion Recognition using Transfer…

A Preliminary Study on Augmenting Speech Emotion Recognition using a Diffusion Model

In this paper, we propose to utilise diffusion models for data augmentation in speech emotion recognition (SER). In particular, we present an effective approach to utilise improved denoising diffusion probabilistic models (IDDPM) to…

Sound · Computer Science 2023-05-22 Ibrahim Malik , Siddique Latif , Raja Jurdak , Björn Schuller

Recognizing More Emotions with Less Data Using Self-supervised Transfer Learning

We propose a novel transfer learning method for speech emotion recognition allowing us to obtain promising results when only few training data is available. With as low as 125 examples per emotion class, we were able to reach a higher…

Machine Learning · Computer Science 2020-11-12 Jonathan Boigne , Biman Liyanage , Ted Östrem

A Transfer Learning Method for Speech Emotion Recognition from Automatic Speech Recognition

This paper presents a transfer learning method in speech emotion recognition based on a Time-Delay Neural Network (TDNN) architecture. A major challenge in the current speech-based emotion detection research is data scarcity. The proposed…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-18 Sitong Zhou , Homayoon Beigi

Multimodal Emotion Recognition using Transfer Learning from Speaker Recognition and BERT-based models

Automatic emotion recognition plays a key role in computer-human interaction as it has the potential to enrich the next-generation artificial intelligence with emotional intelligence. It finds applications in customer and/or representative…

Sound · Computer Science 2022-02-21 Sarala Padi , Seyed Omid Sadjadi , Dinesh Manocha , Ram D. Sriram

Amplifying Emotional Signals: Data-Efficient Deep Learning for Robust Speech Emotion Recognition

Speech Emotion Recognition (SER) presents a significant yet persistent challenge in human-computer interaction. While deep learning has advanced spoken language processing, achieving high performance on limited datasets remains a critical…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-03 Tai Vu

Fast Yet Effective Speech Emotion Recognition with Self-distillation

Speech emotion recognition (SER) is the task of recognising human's emotional states from speech. SER is extremely prevalent in helping dialogue systems to truly understand our emotions and become a trustworthy human conversational partner.…

Sound · Computer Science 2022-10-27 Zhao Ren , Thanh Tam Nguyen , Yi Chang , Björn W. Schuller

Continuous Metric Learning For Transferable Speech Emotion Recognition and Embedding Across Low-resource Languages

Speech emotion recognition~(SER) refers to the technique of inferring the emotional state of an individual from speech signals. SERs continue to garner interest due to their wide applicability. Although the domain is mainly founded on…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-29 Sneha Das , Nicklas Leander Lund , Nicole Nadine Lønfeldt , Anne Katrine Pagsberg , Line H. Clemmensen

Speech Emotion Recognition Using Deep Sparse Auto-Encoder Extreme Learning Machine with a New Weighting Scheme and Spectro-Temporal Features Along with Classical Feature Selection and A New Quantum-Inspired Dimension Reduction Method

Affective computing is very important in the relationship between man and machine. In this paper, a system for speech emotion recognition (SER) based on speech signal is proposed, which uses new techniques in different stages of processing.…

Sound · Computer Science 2021-11-16 Fatemeh Daneshfar , Seyed Jahanshah Kabudian

Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection

Speech Emotion Recognition (SER) is a crucial component in developing general-purpose AI agents capable of natural human-computer interaction. However, building robust multilingual SER systems remains challenging due to the scarcity of…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-08 Hsi-Che Lin , Yi-Cheng Lin , Huang-Cheng Chou , Hung-yi Lee

A breakthrough in Speech emotion recognition using Deep Retinal Convolution Neural Networks

Speech emotion recognition (SER) is to study the formation and change of speaker's emotional state from the speech signal perspective, so as to make the interaction between human and computer more intelligent. SER is a challenging task that…

Sound · Computer Science 2017-08-01 Yafeng Niu , Dongsheng Zou , Yadong Niu , Zhongshi He , Hua Tan

Speaker Attentive Speech Emotion Recognition

Speech Emotion Recognition (SER) task has known significant improvements over the last years with the advent of Deep Neural Networks (DNNs). However, even the most successful methods are still rather failing when adaptation to specific…

Audio and Speech Processing · Electrical Eng. & Systems 2021-04-16 Clément Le Moine , Nicolas Obin , Axel Roebel

Toward Efficient Speech Emotion Recognition via Spectral Learning and Attention

Speech Emotion Recognition (SER) traditionally relies on auditory data analysis for emotion classification. Several studies have adopted different methods for SER. However, existing SER methods often struggle to capture subtle emotional…

Sound · Computer Science 2026-01-23 HyeYoung Lee , Muhammad Nadeem

Knowledge Transfer For On-Device Speech Emotion Recognition with Neural Structured Learning

Speech emotion recognition (SER) has been a popular research topic in human-computer interaction (HCI). As edge devices are rapidly springing up, applying SER to edge devices is promising for a huge number of HCI applications. Although deep…

Sound · Computer Science 2023-05-12 Yi Chang , Zhao Ren , Thanh Tam Nguyen , Kun Qian , Björn W. Schuller

An Efficient Transfer Learning Method Based on Adapter with Local Attributes for Speech Emotion Recognition

Existing speech emotion recognition (SER) methods commonly suffer from the lack of high-quality large-scale corpus, partly due to the complex, psychological nature of emotion which makes accurate labeling difficult and time consuming.…

Sound · Computer Science 2025-09-30 Haoyu Song , Ian McLoughlin , Qing Gu , Nan Jiang , Yan Song

CopyPaste: An Augmentation Method for Speech Emotion Recognition

Data augmentation is a widely used strategy for training robust machine learning models. It partially alleviates the problem of limited data for tasks like speech emotion recognition (SER), where collecting data is expensive and…

Sound · Computer Science 2021-02-12 Raghavendra Pappagari , Jesús Villalba , Piotr Żelasko , Laureano Moro-Velazquez , Najim Dehak

Speech Emotion Recognition with Multiscale Area Attention and Data Augmentation

In Speech Emotion Recognition (SER), emotional characteristics often appear in diverse forms of energy patterns in spectrograms. Typical attention neural network classifiers of SER are usually optimized on a fixed attention granularity. In…

Sound · Computer Science 2021-02-04 Mingke Xu , Fan Zhang , Xiaodong Cui , Wei Zhang

Speech Emotion Recognition with Co-Attention based Multi-level Acoustic Information

Speech Emotion Recognition (SER) aims to help the machine to understand human's subjective emotion from only audio information. However, extracting and utilizing comprehensive in-depth audio information is still a challenging task. In this…

Sound · Computer Science 2022-03-30 Heqing Zou , Yuke Si , Chen Chen , Deepu Rajan , Eng Siong Chng

Speech Emotion Recognition via an Attentive Time-Frequency Neural Network

Spectrogram is commonly used as the input feature of deep neural networks to learn the high(er)-level time-frequency pattern of speech signal for speech emotion recognition (SER). \textcolor{black}{Generally, different emotions correspond…

Sound · Computer Science 2022-10-25 Cheng Lu , Wenming Zheng , Hailun Lian , Yuan Zong , Chuangao Tang , Sunan Li , Yan Zhao

Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, and Augmenting

Significant advances are being made in speech emotion recognition (SER) using deep learning models. Nonetheless, training SER systems remains challenging, requiring both time and costly resources. Like many other machine learning tasks,…

Sound · Computer Science 2023-09-18 Tiantian Feng , Shrikanth Narayanan

Charting 15 years of progress in deep learning for speech emotion recognition: A replication study

Speech emotion recognition (SER) has long benefited from the adoption of deep learning methodologies. Deeper models -- with more layers and more trainable parameters -- are generally perceived as being `better' by the SER community. This…

Sound · Computer Science 2025-08-05 Andreas Triantafyllopoulos , Anton Batliner , Björn W. Schuller