Related papers: Unsupervised Multimodal Language Representations u…

Multimodal Representation Learning With Text and Images

In recent years, multimodal AI has seen an upward trend as researchers are integrating data of different types such as text, images, speech into modelling to get the best results. This project leverages multimodal AI and matrix…

Machine Learning · Computer Science 2022-05-03 Aishwarya Jayagopal , Ankireddy Monica Aiswarya , Ankita Garg , Srinivasan Kolumam Nandakumar

Multimodal Speech Emotion Recognition Using Modality-specific Self-Supervised Frameworks

Emotion recognition is a topic of significant interest in assistive robotics due to the need to equip robots with the ability to comprehend human behavior, facilitating their effective interaction in our society. Consequently, efficient and…

Human-Computer Interaction · Computer Science 2023-12-05 Rutherford Agbeshi Patamia , Paulo E. Santos , Kingsley Nketia Acheampong , Favour Ekong , Kwabena Sarpong , She Kun

Self-Supervised learning with cross-modal transformers for emotion recognition

Emotion recognition is a challenging task due to limited availability of in-the-wild labeled datasets. Self-supervised learning has shown improvements on tasks with limited labeled datasets in domains like speech and natural language.…

Computation and Language · Computer Science 2021-04-08 Aparna Khare , Srinivas Parthasarathy , Shiva Sundaram

Bridging Language, Vision and Action: Multimodal VAEs in Robotic Manipulation Tasks

In this work, we focus on unsupervised vision-language-action mapping in the area of robotic manipulation. Recently, multiple approaches employing pre-trained large language and vision models have been proposed for this task. However, they…

Robotics · Computer Science 2025-05-29 Gabriela Sejnova , Michal Vavrecka , Karla Stepanova

Multi-modal embeddings using multi-task learning for emotion recognition

General embeddings like word2vec, GloVe and ELMo have shown a lot of success in natural language tasks. The embeddings are typically extracted from models that are built on general tasks such as skip-gram models and natural language…

Computation and Language · Computer Science 2020-11-03 Aparna Khare , Srinivas Parthasarathy , Shiva Sundaram

Unsupervised Multi-Modal Representation Learning for Affective Computing with Multi-Corpus Wearable Data

With recent developments in smart technologies, there has been a growing focus on the use of artificial intelligence and machine learning for affective computing to further enhance the user experience through emotion recognition. Typically,…

Machine Learning · Computer Science 2020-08-26 Kyle Ross , Paul Hungler , Ali Etemad

Universal Multimodal Representation for Language Understanding

Representation learning is the foundation of natural language processing (NLP). This work presents new methods to employ visual information as assistant signals to general NLP tasks. For each sentence, we first retrieve a flexible number of…

Computation and Language · Computer Science 2023-01-10 Zhuosheng Zhang , Kehai Chen , Rui Wang , Masao Utiyama , Eiichiro Sumita , Zuchao Li , Hai Zhao

Unsupervised Cross-lingual Word Embedding by Multilingual Neural Language Models

We propose an unsupervised method to obtain cross-lingual embeddings without any parallel data or pre-trained word embeddings. The proposed model, which we call multilingual neural language models, takes sentences of multiple languages as…

Computation and Language · Computer Science 2018-09-10 Takashi Wada , Tomoharu Iwata

Multimodal Embeddings from Language Models

Word embeddings such as ELMo have recently been shown to model word semantics with greater efficacy through contextualized learning on large-scale language corpora, resulting in significant improvement in state of the art across many…

Computation and Language · Computer Science 2019-09-11 Shao-Yen Tseng , Panayiotis Georgiou , Shrikanth Narayanan

Learning to Predict: A Fast Re-constructive Method to Generate Multimodal Embeddings

Integrating visual and linguistic information into a single multimodal representation is an unsolved problem with wide-reaching applications to both natural language processing and computer vision. In this paper, we present a simple method…

Machine Learning · Statistics 2017-03-28 Guillem Collell , Teddy Zhang , Marie-Francine Moens

An Autoencoder Approach to Learning Bilingual Word Representations

Cross-language learning allows us to use training data from one language to build models for a different language. Many approaches to bilingual learning require that we have word-level alignment of sentences from parallel corpora. In this…

Computation and Language · Computer Science 2014-02-07 Sarath Chandar A P , Stanislas Lauly , Hugo Larochelle , Mitesh M. Khapra , Balaraman Ravindran , Vikas Raykar , Amrita Saha

Multimodal Intelligence: Representation Learning, Information Fusion, and Applications

Deep learning methods have revolutionized speech recognition, image recognition, and natural language processing since 2010. Each of these tasks involves a single modality in their input signals. However, many applications in the artificial…

Artificial Intelligence · Computer Science 2020-07-15 Chao Zhang , Zichao Yang , Xiaodong He , Li Deng

Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition

Training Transformer-based models demands a large amount of data, while obtaining aligned and labelled data in multimodality is rather cost-demanding, especially for audio-visual speech recognition (AVSR). Thus it makes a lot of sense to…

Sound · Computer Science 2022-03-29 Xichen Pan , Peiyu Chen , Yichen Gong , Helong Zhou , Xinbing Wang , Zhouhan Lin

A Simple Approach to Learning Unsupervised Multilingual Embeddings

Recent progress on unsupervised learning of cross-lingual embeddings in bilingual setting has given impetus to learning a shared embedding space for several languages without any supervision. A popular framework to solve the latter problem…

Computation and Language · Computer Science 2020-04-21 Pratik Jawanpuria , Mayank Meghwanshi , Bamdev Mishra

Seq2Seq2Sentiment: Multimodal Sequence to Sequence Models for Sentiment Analysis

Multimodal machine learning is a core research area spanning the language, visual and acoustic modalities. The central challenge in multimodal learning involves learning representations that can process and relate information from multiple…

Computation and Language · Computer Science 2018-08-07 Hai Pham , Thomas Manzini , Paul Pu Liang , Barnabas Poczos

Learning Multilingual Word Representations using a Bag-of-Words Autoencoder

Recent work on learning multilingual word representations usually relies on the use of word-level alignements (e.g. infered with the help of GIZA++) between translated sentences, in order to align the word embeddings in different languages.…

Computation and Language · Computer Science 2014-01-09 Stanislas Lauly , Alex Boulanger , Hugo Larochelle

Bilingual Learning of Multi-sense Embeddings with Discrete Autoencoders

We present an approach to learning multi-sense word embeddings relying both on monolingual and bilingual information. Our model consists of an encoder, which uses monolingual and bilingual context (i.e. a parallel sentence) to choose a…

Computation and Language · Computer Science 2016-03-31 Simon Šuster , Ivan Titov , Gertjan van Noord

Learning Alignment for Multimodal Emotion Recognition from Speech

Speech emotion recognition is a challenging problem because human convey emotions in subtle and complex ways. For emotion recognition on human speech, one can either extract emotion related features from audio signals or employ speech…

Computation and Language · Computer Science 2020-04-06 Haiyang Xu , Hui Zhang , Kun Han , Yun Wang , Yiping Peng , Xiangang Li

Semi-supervised Multimodal Representation Learning through a Global Workspace

Recent deep learning models can efficiently combine inputs from different modalities (e.g., images and text) and learn to align their latent representations, or to translate signals from one domain to another (as in image captioning, or…

Artificial Intelligence · Computer Science 2025-11-27 Benjamin Devillers , Léopold Maytié , Rufin VanRullen

Multilingual Word Embeddings using Multigraphs

We present a family of neural-network--inspired models for computing continuous word representations, specifically designed to exploit both monolingual and multilingual text. This framework allows us to perform unsupervised training of…

Computation and Language · Computer Science 2016-12-15 Radu Soricut , Nan Ding