Related papers: Multimodal Speech Emotion Recognition Using Modali…

Multi-Modal Emotion Recognition by Text, Speech and Video Using Pretrained Transformers

Due to the complex nature of human emotions and the diversity of emotion representation methods in humans, emotion recognition is a challenging field. In this research, three input modalities, namely text, audio (speech), and video, are…

Artificial Intelligence · Computer Science 2024-02-13 Minoo Shayaninasab , Bagher Babaali

Bimodal Speech Emotion Recognition Using Pre-Trained Language Models

Speech emotion recognition is a challenging task and an important step towards more natural human-machine interaction. We show that pre-trained language models can be fine-tuned for text emotion recognition, achieving an accuracy of 69.5%…

Audio and Speech Processing · Electrical Eng. & Systems 2019-12-06 Verena Heusser , Niklas Freymuth , Stefan Constantin , Alex Waibel

An Empirical Study and Improvement for Speech Emotion Recognition

Multimodal speech emotion recognition aims to detect speakers' emotions from audio and text. Prior works mainly focus on exploiting advanced networks to model and fuse different modality information to facilitate performance, while…

Computation and Language · Computer Science 2023-04-11 Zhen Wu , Yizhe Lu , Xinyu Dai

Self-Supervised learning with cross-modal transformers for emotion recognition

Emotion recognition is a challenging task due to limited availability of in-the-wild labeled datasets. Self-supervised learning has shown improvements on tasks with limited labeled datasets in domains like speech and natural language.…

Computation and Language · Computer Science 2021-04-08 Aparna Khare , Srinivas Parthasarathy , Shiva Sundaram

Learning Alignment for Multimodal Emotion Recognition from Speech

Speech emotion recognition is a challenging problem because human convey emotions in subtle and complex ways. For emotion recognition on human speech, one can either extract emotion related features from audio signals or employ speech…

Computation and Language · Computer Science 2020-04-06 Haiyang Xu , Hui Zhang , Kun Han , Yun Wang , Yiping Peng , Xiangang Li

Semi-supervised Multi-modal Emotion Recognition with Cross-Modal Distribution Matching

Automatic emotion recognition is an active research topic with wide range of applications. Due to the high manual annotation cost and inevitable label ambiguity, the development of emotion recognition dataset is limited in both scale and…

Audio and Speech Processing · Electrical Eng. & Systems 2020-09-08 Jingjun Liang , Ruichen Li , Qin Jin

Multi-Modal Emotion recognition on IEMOCAP Dataset using Deep Learning

Emotion recognition has become an important field of research in Human Computer Interactions as we improve upon the techniques for modelling the various aspects of behaviour. With the advancement of technology our understanding of emotions…

Artificial Intelligence · Computer Science 2019-11-11 Samarth Tripathi , Sarthak Tripathi , Homayoon Beigi

Multimodal Emotion Recognition using Transfer Learning from Speaker Recognition and BERT-based models

Automatic emotion recognition plays a key role in computer-human interaction as it has the potential to enrich the next-generation artificial intelligence with emotional intelligence. It finds applications in customer and/or representative…

Sound · Computer Science 2022-02-21 Sarala Padi , Seyed Omid Sadjadi , Dinesh Manocha , Ram D. Sriram

Multimodal Emotion Recognition with High-level Speech and Text Features

Automatic emotion recognition is one of the central concerns of the Human-Computer Interaction field as it can bridge the gap between humans and machines. Current works train deep learning models on low-level data representations to solve…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-22 Mariana Rodrigues Makiuchi , Kuniaki Uto , Koichi Shinoda

Multimodal Emotion Recognition and Sentiment Analysis in Multi-Party Conversation Contexts

Emotion recognition and sentiment analysis are pivotal tasks in speech and language processing, particularly in real-world scenarios involving multi-party, conversational data. This paper presents a multimodal approach to tackle these…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Aref Farhadipour , Hossein Ranjbar , Masoumeh Chapariniya , Teodora Vukovic , Sarah Ebling , Volker Dellwo

MMER: Multimodal Multi-task Learning for Speech Emotion Recognition

In this paper, we propose MMER, a novel Multimodal Multi-task learning approach for Speech Emotion Recognition. MMER leverages a novel multimodal network based on early-fusion and cross-modal self-attention between text and acoustic…

Computation and Language · Computer Science 2023-06-06 Sreyan Ghosh , Utkarsh Tyagi , S Ramaneswaran , Harshvardhan Srivastava , Dinesh Manocha

Multilevel Transformer For Multimodal Emotion Recognition

Multimodal emotion recognition has attracted much attention recently. Fusing multiple modalities effectively with limited labeled data is a challenging task. Considering the success of pre-trained model and fine-grained nature of emotion…

Computation and Language · Computer Science 2023-03-02 Junyi He , Meimei Wu , Meng Li , Xiaobo Zhu , Feng Ye

Deep Multimodal Learning for Emotion Recognition in Spoken Language

In this paper, we present a novel deep multimodal framework to predict human emotions based on sentence-level spoken language. Our architecture has two distinctive characteristics. First, it extracts the high-level features from both text…

Computation and Language · Computer Science 2018-02-26 Yue Gu , Shuhong Chen , Ivan Marsic

Rethinking Multimodal Sentiment Analysis: A High-Accuracy, Simplified Fusion Architecture

Multimodal sentiment analysis, a pivotal task in affective computing, seeks to understand human emotions by integrating cues from language, audio, and visual signals. While many recent approaches leverage complex attention mechanisms and…

Computation and Language · Computer Science 2025-05-09 Nischal Mandal , Yang Li

M4SER: Multimodal, Multirepresentation, Multitask, and Multistrategy Learning for Speech Emotion Recognition

Multimodal speech emotion recognition (SER) has emerged as pivotal for improving human-machine interaction. Researchers are increasingly leveraging both speech and textual information obtained through automatic speech recognition (ASR) to…

Human-Computer Interaction · Computer Science 2025-09-24 Jiajun He , Xiaohan Shi , Cheng-Hung Hu , Jinyi Mi , Xingfeng Li , Tomoki Toda

MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal Emotion Recognition

Multimodal emotion recognition study is hindered by the lack of labelled corpora in terms of scale and diversity, due to the high annotation cost and label ambiguity. In this paper, we propose a pre-training model \textbf{MEmoBERT} for…

Computer Vision and Pattern Recognition · Computer Science 2021-11-02 Jinming Zhao , Ruichen Li , Qin Jin , Xinchao Wang , Haizhou Li

Recognizing More Emotions with Less Data Using Self-supervised Transfer Learning

We propose a novel transfer learning method for speech emotion recognition allowing us to obtain promising results when only few training data is available. With as low as 125 examples per emotion class, we were able to reach a higher…

Machine Learning · Computer Science 2020-11-12 Jonathan Boigne , Biman Liyanage , Ted Östrem

Unsupervised Multimodal Language Representations using Convolutional Autoencoders

Multimodal Language Analysis is a demanding area of research, since it is associated with two requirements: combining different modalities and capturing temporal information. During the last years, several works have been proposed in the…

Computation and Language · Computer Science 2022-01-10 Panagiotis Koromilas , Theodoros Giannakopoulos

Multi-modal Mood Reader: Pre-trained Model Empowers Cross-Subject Emotion Recognition

Emotion recognition based on Electroencephalography (EEG) has gained significant attention and diversified development in fields such as neural signal processing and affective computing. However, the unique brain anatomy of individuals…

Signal Processing · Electrical Eng. & Systems 2024-05-31 Yihang Dong , Xuhang Chen , Yanyan Shen , Michael Kwok-Po Ng , Tao Qian , Shuqiang Wang

Representation learning through cross-modal conditional teacher-student training for speech emotion recognition

Generic pre-trained speech and text representations promise to reduce the need for large labeled datasets on specific speech and language tasks. However, it is not clear how to effectively adapt these representations for speech emotion…

Audio and Speech Processing · Electrical Eng. & Systems 2022-01-28 Sundararajan Srinivasan , Zhaocheng Huang , Katrin Kirchhoff