Related papers: Multi-modal embeddings using multi-task learning f…

Multimodal Embeddings from Language Models

Word embeddings such as ELMo have recently been shown to model word semantics with greater efficacy through contextualized learning on large-scale language corpora, resulting in significant improvement in state of the art across many…

Computation and Language · Computer Science 2019-09-11 Shao-Yen Tseng , Panayiotis Georgiou , Shrikanth Narayanan

Unsupervised Multimodal Language Representations using Convolutional Autoencoders

Multimodal Language Analysis is a demanding area of research, since it is associated with two requirements: combining different modalities and capturing temporal information. During the last years, several works have been proposed in the…

Computation and Language · Computer Science 2022-01-10 Panagiotis Koromilas , Theodoros Giannakopoulos

Self-Supervised learning with cross-modal transformers for emotion recognition

Emotion recognition is a challenging task due to limited availability of in-the-wild labeled datasets. Self-supervised learning has shown improvements on tasks with limited labeled datasets in domains like speech and natural language.…

Computation and Language · Computer Science 2021-04-08 Aparna Khare , Srinivas Parthasarathy , Shiva Sundaram

Emo2Vec: Learning Generalized Emotion Representation by Multi-task Training

In this paper, we propose Emo2Vec which encodes emotional semantics into vectors. We train Emo2Vec by multi-task learning six different emotion-related tasks, including emotion/sentiment analysis, sarcasm classification, stress detection,…

Computation and Language · Computer Science 2018-09-13 Peng Xu , Andrea Madotto , Chien-Sheng Wu , Ji Ho Park , Pascale Fung

Convolutional Attention Networks for Multimodal Emotion Recognition from Speech and Text Data

Emotion recognition has become a popular topic of interest, especially in the field of human computer interaction. Previous works involve unimodal analysis of emotion, while recent efforts focus on multi-modal emotion recognition from…

Computation and Language · Computer Science 2019-03-11 Chan Woo Lee , Kyu Ye Song , Jihoon Jeong , Woo Yong Choi

From Word Vectors to Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models

Word embeddings and language models have transformed natural language processing (NLP) by facilitating the representation of linguistic elements in continuous vector spaces. This review visits foundational concepts such as the…

Computation and Language · Computer Science 2025-12-03 Charles Zhang , Benji Peng , Xintian Sun , Qian Niu , Junyu Liu , Keyu Chen , Ming Li , Pohsun Feng , Ziqian Bi , Ming Liu , Yichao Zhang , Xinyuan Song , Cheng Fei , Caitlyn Heqi Yin , Lawrence KQ Yan , Hongyang He , Tianyang Wang

Multimodal Speech Emotion Recognition Using Modality-specific Self-Supervised Frameworks

Emotion recognition is a topic of significant interest in assistive robotics due to the need to equip robots with the ability to comprehend human behavior, facilitating their effective interaction in our society. Consequently, efficient and…

Human-Computer Interaction · Computer Science 2023-12-05 Rutherford Agbeshi Patamia , Paulo E. Santos , Kingsley Nketia Acheampong , Favour Ekong , Kwabena Sarpong , She Kun

Multi-Modal Emotion Recognition by Text, Speech and Video Using Pretrained Transformers

Due to the complex nature of human emotions and the diversity of emotion representation methods in humans, emotion recognition is a challenging field. In this research, three input modalities, namely text, audio (speech), and video, are…

Artificial Intelligence · Computer Science 2024-02-13 Minoo Shayaninasab , Bagher Babaali

Learning Alignment for Multimodal Emotion Recognition from Speech

Speech emotion recognition is a challenging problem because human convey emotions in subtle and complex ways. For emotion recognition on human speech, one can either extract emotion related features from audio signals or employ speech…

Computation and Language · Computer Science 2020-04-06 Haiyang Xu , Hui Zhang , Kun Han , Yun Wang , Yiping Peng , Xiangang Li

emoji2vec: Learning Emoji Representations from their Description

Many current natural language processing applications for social media rely on representation learning and utilize pre-trained word embeddings. There currently exist several publicly-available, pre-trained sets of word embeddings, but they…

Computation and Language · Computer Science 2016-11-22 Ben Eisner , Tim Rocktäschel , Isabelle Augenstein , Matko Bošnjak , Sebastian Riedel

Learned In Speech Recognition: Contextual Acoustic Word Embeddings

End-to-end acoustic-to-word speech recognition models have recently gained popularity because they are easy to train, scale well to large amounts of training data, and do not require a lexicon. In addition, word models may also be easier to…

Computation and Language · Computer Science 2019-02-20 Shruti Palaskar , Vikas Raunak , Florian Metze

Emotional Embeddings: Refining Word Embeddings to Capture Emotional Content of Words

Word embeddings are one of the most useful tools in any modern natural language processing expert's toolkit. They contain various types of information about each word which makes them the best way to represent the terms in any NLP task. But…

Computation and Language · Computer Science 2019-06-20 Armin Seyeditabari , Narges Tabari , Shafie Gholizade , Wlodek Zadrozny

Multi-Modal Emotion Detection with Transfer Learning

Automated emotion detection in speech is a challenging task due to the complex interdependence between words and the manner in which they are spoken. It is made more difficult by the available datasets; their small size and incompatible…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-16 Amith Ananthram , Kailash Karthik Saravanakumar , Jessica Huynh , Homayoon Beigi

Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis

Related tasks often have inter-dependence on each other and perform better when solved in a joint framework. In this paper, we present a deep multi-task learning framework that jointly performs sentiment and emotion analysis both. The…

Computation and Language · Computer Science 2019-05-16 Md Shad Akhtar , Dushyant Singh Chauhan , Deepanway Ghosal , Soujanya Poria , Asif Ekbal , Pushpak Bhattacharyya

MMER: Multimodal Multi-task Learning for Speech Emotion Recognition

In this paper, we propose MMER, a novel Multimodal Multi-task learning approach for Speech Emotion Recognition. MMER leverages a novel multimodal network based on early-fusion and cross-modal self-attention between text and acoustic…

Computation and Language · Computer Science 2023-06-06 Sreyan Ghosh , Utkarsh Tyagi , S Ramaneswaran , Harshvardhan Srivastava , Dinesh Manocha

Modality-Transferable Emotion Embeddings for Low-Resource Multimodal Emotion Recognition

Despite the recent achievements made in the multi-modal emotion recognition task, two problems still exist and have not been well investigated: 1) the relationship between different emotion categories are not utilized, which leads to…

Computation and Language · Computer Science 2020-10-08 Wenliang Dai , Zihan Liu , Tiezheng Yu , Pascale Fung

Audio-Linguistic Embeddings for Spoken Sentences

We propose spoken sentence embeddings which capture both acoustic and linguistic content. While existing works operate at the character, phoneme, or word level, our method learns long-term dependencies by modeling speech at the sentence…

Sound · Computer Science 2019-02-22 Albert Haque , Michelle Guo , Prateek Verma , Li Fei-Fei

A Survey on Contextual Embeddings

Contextual embeddings, such as ELMo and BERT, move beyond global word representations like Word2Vec and achieve ground-breaking performance on a wide range of natural language processing tasks. Contextual embeddings assign each word a…

Computation and Language · Computer Science 2020-04-14 Qi Liu , Matt J. Kusner , Phil Blunsom

Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings

Emotion recognition datasets are relatively small, making the use of the more sophisticated deep learning approaches challenging. In this work, we propose a transfer learning method for speech emotion recognition where features extracted…

Sound · Computer Science 2021-04-09 Leonardo Pepino , Pablo Riera , Luciana Ferrer

Multimodal Emotion Recognition with High-level Speech and Text Features

Automatic emotion recognition is one of the central concerns of the Human-Computer Interaction field as it can bridge the gap between humans and machines. Current works train deep learning models on low-level data representations to solve…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-22 Mariana Rodrigues Makiuchi , Kuniaki Uto , Koichi Shinoda