English
Related papers

Related papers: Learning Modality-Specific Representations with Se…

200 papers

Multimodal representation learning poses significant challenges in capturing informative and distinct features from multiple modalities. Existing methods often struggle to exploit the unique characteristics of each modality due to unified…

Computer Vision and Pattern Recognition · Computer Science 2023-11-08 Cam-Van Thi Nguyen , Ngoc-Hoa Thi Nguyen , Duc-Trong Le , Quang-Thuy Ha

In line with the latest research, the task of identifying helpful reviews from a vast pool of user-generated textual and visual data has become a prominent area of study. Effective modal representations are expected to possess two key…

Multimedia · Computer Science 2024-03-26 HongLin Gong , Mengzhao Jia , Liqiang Jing

Recently self supervised learning has seen explosive growth and use in variety of machine learning tasks because of its ability to avoid the cost of annotating large-scale datasets. This paper gives an overview for best self supervised…

Machine Learning · Computer Science 2022-10-21 Naman Goyal

Multimodal sentiment analysis aims to effectively integrate information from various sources to infer sentiment, where in many cases there are no annotations for unimodal labels. Therefore, most works rely on multimodal labels for training.…

Machine Learning · Computer Science 2024-09-16 Sijie Mai , Yu Zhao , Ying Zeng , Jianhua Yao , Haifeng Hu

Learning multimodal representations is a fundamentally complex research problem due to the presence of multiple heterogeneous sources of information. Although the presence of multiple modalities provides additional valuable information,…

Machine Learning · Computer Science 2019-05-15 Yao-Hung Hubert Tsai , Paul Pu Liang , Amir Zadeh , Louis-Philippe Morency , Ruslan Salakhutdinov

Multimodal learning, which aims to understand and analyze information from multiple modalities, has achieved substantial progress in the supervised regime in recent years. However, the heavy dependence on data paired with expensive human…

Machine Learning · Computer Science 2024-08-19 Yongshuo Zong , Oisin Mac Aodha , Timothy Hospedales

Designing an effective representation learning method for multimodal sentiment analysis tasks is a crucial research direction. The challenge lies in learning both shared and private information in a complete modal representation, which is…

Computation and Language · Computer Science 2024-03-20 Songning Lai , Jiakang Li , Guinan Guo , Xifeng Hu , Yulong Li , Yuan Tan , Zichen Song , Yutong Liu , Zhaoxia Ren , Chun Wan , Danmin Miao , Zhi Liu

Modality representation learning is an important problem for multimodal sentiment analysis (MSA), since the highly distinguishable representations can contribute to improving the analysis effect. Previous works of MSA have usually focused…

Multimedia · Computer Science 2023-01-31 Peipei Liu , Xin Zheng , Hong Li , Jie Liu , Yimo Ren , Hongsong Zhu , Limin Sun

With the increasing multimedia information, multimodal recommendation has received extensive attention. It utilizes multimodal information to alleviate the data sparsity problem in recommendation systems, thus improving recommendation…

Information Retrieval · Computer Science 2024-03-01 Jinfeng Xu , Zheyu Chen , Shuo Yang , Jinze Li , Hewei Wang , Edith C. -H. Ngai

One of the key factors of enabling machine learning models to comprehend and solve real-world tasks is to leverage multimodal data. Unfortunately, annotation of multimodal data is challenging and expensive. Recently, self-supervised…

Computer Vision and Pattern Recognition · Computer Science 2020-12-11 Elad Amrani , Rami Ben-Ari , Daniel Rotman , Alex Bronstein

Multimodal learning aims to imitate human beings to acquire complementary information from multiple modalities for various downstream tasks. However, traditional aggregation-based multimodal fusion methods ignore the inter-modality…

Computer Vision and Pattern Recognition · Computer Science 2023-05-17 Heqing Zou , Meng Shen , Chen Chen , Yuchen Hu , Deepu Rajan , Eng Siong Chng

The online emergence of multi-modal sharing platforms (eg, TikTok, Youtube) is powering personalized recommender systems to incorporate various modalities (eg, visual, textual and acoustic) into the latent user representations. While…

Information Retrieval · Computer Science 2023-07-19 Wei Wei , Chao Huang , Lianghao Xia , Chuxu Zhang

Multimodal machine learning is a core research area spanning the language, visual and acoustic modalities. The central challenge in multimodal learning involves learning representations that can process and relate information from multiple…

Computation and Language · Computer Science 2018-08-07 Hai Pham , Thomas Manzini , Paul Pu Liang , Barnabas Poczos

Self-supervised learning is an efficient pre-training method for medical image analysis. However, current research is mostly confined to specific-modality data pre-training, consuming considerable time and resources without achieving…

Computer Vision and Pattern Recognition · Computer Science 2023-12-01 Yiwen Ye , Yutong Xie , Jianpeng Zhang , Ziyang Chen , Qi Wu , Yong Xia

This work focuses on learning useful and robust deep world models using multiple, possibly unreliable, sensors. We find that current methods do not sufficiently encourage a shared representation between modalities; this can cause poor…

Machine Learning · Computer Science 2021-07-07 Kaiqi Chen , Yong Lee , Harold Soh

Multi-modal Multi-label Emotion Recognition (MMER) aims to identify various human emotions from heterogeneous visual, audio and text modalities. Previous methods mainly focus on projecting multiple modalities into a common latent space and…

Computer Vision and Pattern Recognition · Computer Science 2022-01-19 Yi Zhang , Mingyuan Chen , Jundong Shen , Chongjun Wang

In many machine learning systems that jointly learn from multiple modalities, a core research question is to understand the nature of multimodal interactions: how modalities combine to provide new task-relevant information that was not…

Automatic emotion recognition is an active research topic with wide range of applications. Due to the high manual annotation cost and inevitable label ambiguity, the development of emotion recognition dataset is limited in both scale and…

Audio and Speech Processing · Electrical Eng. & Systems 2020-09-08 Jingjun Liang , Ruichen Li , Qin Jin

Multimodal networks have demonstrated remarkable performance improvements over their unimodal counterparts. Existing multimodal networks are designed in a multi-branch fashion that, due to the reliance on fusion strategies, exhibit…

Multimodal learning systems often face substantial uncertainty due to noisy data, low-quality labels, and heterogeneous modality characteristics. These issues become especially critical in human-computer interaction settings, where data…

Artificial Intelligence · Computer Science 2025-11-21 Hyo-Jeong Jang
‹ Prev 1 2 3 10 Next ›