Related papers: Noise Estimation Using Density Estimation for Self…

Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions

Multimodal deep learning systems which employ multiple modalities like text, image, audio, video, etc., are showing better performance in comparison with individual modalities (i.e., unimodal) systems. Multimodal machine learning involves…

Machine Learning · Computer Science 2022-01-19 Anil Rahate , Rahee Walambe , Sheela Ramanna , Ketan Kotecha

Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments

A real-world application or setting involves interaction between different modalities (e.g., video, speech, text). In order to process the multimodal information automatically and use it for an end application, Multimodal Representation…

Computer Vision and Pattern Recognition · Computer Science 2022-11-08 Abhinav Joshi , Naman Gupta , Jinang Shah , Binod Bhattarai , Ashutosh Modi , Danail Stoyanov

Uncertainty-Resilient Multimodal Learning via Consistency-Guided Cross-Modal Transfer

Multimodal learning systems often face substantial uncertainty due to noisy data, low-quality labels, and heterogeneous modality characteristics. These issues become especially critical in human-computer interaction settings, where data…

Artificial Intelligence · Computer Science 2025-11-21 Hyo-Jeong Jang

Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis

Representation Learning is a significant and challenging task in multimodal learning. Effective modality representations should contain two parts of characteristics: the consistency and the difference. Due to the unified multimodal…

Computation and Language · Computer Science 2021-02-10 Wenmeng Yu , Hua Xu , Ziqi Yuan , Jiele Wu

Self-Supervised Multimodal Learning: A Survey

Multimodal learning, which aims to understand and analyze information from multiple modalities, has achieved substantial progress in the supervised regime in recent years. However, the heavy dependence on data paired with expensive human…

Machine Learning · Computer Science 2024-08-19 Yongshuo Zong , Oisin Mac Aodha , Timothy Hospedales

Noise-Tolerant Learning for Audio-Visual Action Recognition

Recently, video recognition is emerging with the help of multi-modal learning, which focuses on integrating distinct modalities to improve the performance or robustness of the model. Although various multi-modal learning methods have been…

Computer Vision and Pattern Recognition · Computer Science 2024-10-28 Haochen Han , Qinghua Zheng , Minnan Luo , Kaiyao Miao , Feng Tian , Yan Chen

Robust-MSA: Understanding the Impact of Modality Noise on Multimodal Sentiment Analysis

Improving model robustness against potential modality noise, as an essential step for adapting multimodal models to real-world applications, has received increasing attention among researchers. For Multimodal Sentiment Analysis (MSA), there…

Multimedia · Computer Science 2022-11-28 Huisheng Mao , Baozheng Zhang , Hua Xu , Ziqi Yuan , Yihe Liu

Unsupervised Multimodal Language Representations using Convolutional Autoencoders

Multimodal Language Analysis is a demanding area of research, since it is associated with two requirements: combining different modalities and capturing temporal information. During the last years, several works have been proposed in the…

Computation and Language · Computer Science 2022-01-10 Panagiotis Koromilas , Theodoros Giannakopoulos

Noise Contrastive Meta-Learning for Conditional Density Estimation using Kernel Mean Embeddings

Current meta-learning approaches focus on learning functional representations of relationships between variables, i.e. on estimating conditional expectations in regression. In many applications, however, we are faced with conditional…

Machine Learning · Statistics 2021-02-25 Jean-Francois Ton , Lucian Chan , Yee Whye Teh , Dino Sejdinovic

Learning Noise-Robust Joint Representation for Multimodal Emotion Recognition under Incomplete Data Scenarios

Multimodal emotion recognition (MER) in practical scenarios is significantly challenged by the presence of missing or incomplete data across different modalities. To overcome these challenges, researchers have aimed to simulate incomplete…

Computer Vision and Pattern Recognition · Computer Science 2024-09-20 Qi Fan , Haolin Zuo , Rui Liu , Zheng Lian , Guanglai Gao

FANoise: Singular Value-Adaptive Noise Modulation for Robust Multimodal Representation Learning

Representation learning is fundamental to modern machine learning, powering applications such as text retrieval and multimodal understanding. However, learning robust and generalizable representations remains challenging. While prior work…

Machine Learning · Computer Science 2025-11-27 Jiaoyang Li , Jun Fang , Tianhao Gao , Xiaohui Zhang , Zhiyuan Liu , Chao Liu , Pengzhang Liu , Qixia Jiang

Two-stage Deep Denoising with Self-guided Noise Attention for Multimodal Medical Images

Medical image denoising is considered among the most challenging vision tasks. Despite the real-world implications, existing denoising methods have notable drawbacks as they often generate visual artifacts when applied to heterogeneous…

Image and Video Processing · Electrical Eng. & Systems 2025-03-11 S M A Sharif , Rizwan Ali Naqvi , Woong-Kee Loh

Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications

In many machine learning systems that jointly learn from multiple modalities, a core research question is to understand the nature of multimodal interactions: how modalities combine to provide new task-relevant information that was not…

Machine Learning · Computer Science 2024-06-14 Paul Pu Liang , Chun Kai Ling , Yun Cheng , Alex Obolenskiy , Yudong Liu , Rohan Pandey , Alex Wilf , Louis-Philippe Morency , Ruslan Salakhutdinov

A Multimodal-Multitask Framework with Cross-modal Relation and Hierarchical Interactive Attention for Semantic Comprehension

A major challenge in multimodal learning is the presence of noise within individual modalities. This noise inherently affects the resulting multimodal representations, especially when these representations are obtained through explicit…

Computer Vision and Pattern Recognition · Computer Science 2025-08-25 Mohammad Zia Ur Rehman , Devraj Raghuvanshi , Umang Jain , Shubhi Bansal , Nagendra Kumar

More for Less: Non-Intrusive Speech Quality Assessment with Limited Annotations

Non-intrusive speech quality assessment is a crucial operation in multimedia applications. The scarcity of annotated data and the lack of a reference signal represent some of the main challenges for designing efficient quality assessment…

Audio and Speech Processing · Electrical Eng. & Systems 2021-08-20 Alessandro Ragano , Emmanouil Benetos , Andrew Hines

SelfMix: Robust Learning Against Textual Label Noise with Self-Mixup Training

The conventional success of textual classification relies on annotated data, and the new paradigm of pre-trained language models (PLMs) still requires a few labeled data for downstream tasks. However, in real-world applications, label noise…

Computation and Language · Computer Science 2022-10-14 Dan Qiao , Chenchen Dai , Yuyang Ding , Juntao Li , Qiang Chen , Wenliang Chen , Min Zhang

Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning

Multi-modal learning, particularly among imaging and linguistic modalities, has made amazing strides in many high-level fundamental visual understanding problems, ranging from language grounding to dense event captioning. However, much of…

Computer Vision and Pattern Recognition · Computer Science 2019-10-28 Tanzila Rahman , Bicheng Xu , Leonid Sigal

MultiBench: Multiscale Benchmarks for Multimodal Representation Learning

Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world applications in multimedia, affective computing, robotics,…

Machine Learning · Computer Science 2021-11-11 Paul Pu Liang , Yiwei Lyu , Xiang Fan , Zetian Wu , Yun Cheng , Jason Wu , Leslie Chen , Peter Wu , Michelle A. Lee , Yuke Zhu , Ruslan Salakhutdinov , Louis-Philippe Morency

Investigations on Audiovisual Emotion Recognition in Noisy Conditions

In this paper we explore audiovisual emotion recognition under noisy acoustic conditions with a focus on speech features. We attempt to answer the following research questions: (i) How does speech emotion recognition perform on noisy data?…

Sound · Computer Science 2021-03-03 Michael Neumann , Ngoc Thang Vu

Detect and Correct: A Selective Noise Correction Method for Learning with Noisy Labels

Falsely annotated samples, also known as noisy labels, can significantly harm the performance of deep learning models. Two main approaches for learning with noisy labels are global noise estimation and data filtering. Global noise…

Machine Learning · Computer Science 2025-07-31 Yuval Grinberg , Nimrod Harel , Jacob Goldberger , Ofir Lindenbaum