Related papers: DeepSuM: Deep Sufficient Modality Learning Framewo…

Robust Multimodal Learning via Representation Decoupling

Multimodal learning robust to missing modality has attracted increasing attention due to its practicality. Existing methods tend to address it by learning a common subspace representation for different modality combinations. However, we…

Computer Vision and Pattern Recognition · Computer Science 2024-07-08 Shicai Wei , Yang Luo , Yuji Wang , Chunbo Luo

Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions

Multimodal deep learning systems which employ multiple modalities like text, image, audio, video, etc., are showing better performance in comparison with individual modalities (i.e., unimodal) systems. Multimodal machine learning involves…

Machine Learning · Computer Science 2022-01-19 Anil Rahate , Rahee Walambe , Sheela Ramanna , Ketan Kotecha

MMP: Towards Robust Multi-Modal Learning with Masked Modality Projection

Multimodal learning seeks to combine data from multiple input sources to enhance the performance of different downstream tasks. In real-world scenarios, performance can degrade substantially if some input modalities are missing. Existing…

Machine Learning · Computer Science 2024-10-10 Niki Nezakati , Md Kaykobad Reza , Ameya Patil , Mashhour Solh , M. Salman Asif

Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications

Multimodality Representation Learning, as a technique of learning to embed information from different modalities and their correlations, has achieved remarkable success on a variety of applications, such as Visual Question Answering (VQA),…

Artificial Intelligence · Computer Science 2024-03-04 Muhammad Arslan Manzoor , Sarah Albarri , Ziting Xian , Zaiqiao Meng , Preslav Nakov , Shangsong Liang

HyperLearn: A Distributed Approach for Representation Learning in Datasets With Many Modalities

Multimodal datasets contain an enormous amount of relational information, which grows exponentially with the introduction of new modalities. Learning representations in such a scenario is inherently complex due to the presence of multiple…

Machine Learning · Computer Science 2019-09-24 Devanshu Arya , Stevan Rudinac , Marcel Worring

Modal Uncertainty Estimation via Discrete Latent Representation

Many important problems in the real world don't have unique solutions. It is thus important for machine learning models to be capable of proposing different plausible solutions with meaningful probability measures. In this work we introduce…

Machine Learning · Computer Science 2020-07-28 Di Qiu , Lok Ming Lui

Fusion or Confusion? Multimodal Complexity Is Not All You Need

Multimodal learning has become a prominent research area, with the potential of substantial performance gains by combining information across modalities. At the same time, model development has trended toward increasingly complex deep…

Machine Learning · Computer Science 2026-05-08 Tillmann Rheude , Roland Eils , Benjamin Wild

Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models

Multimodal learning typically relies on the assumption that all modalities are fully available during both the training and inference phases. However, in real-world scenarios, consistently acquiring complete multimodal data presents…

Computer Vision and Pattern Recognition · Computer Science 2024-07-18 Donggeun Kim , Taesup Kim

HyperMM : Robust Multimodal Learning with Varying-sized Inputs

Combining multiple modalities carrying complementary information through multimodal learning (MML) has shown considerable benefits for diagnosing multiple pathologies. However, the robustness of multimodal models to missing modalities is…

Machine Learning · Computer Science 2024-07-31 Hava Chaptoukaev , Vincenzo Marcianó , Francesco Galati , Maria A. Zuluaga

On Robustness in Multimodal Learning

Multimodal learning is defined as learning over multiple heterogeneous input modalities such as video, audio, and text. In this work, we are concerned with understanding how models behave as the type of modalities differ between training…

Machine Learning · Computer Science 2023-04-12 Brandon McKinzie , Joseph Cheng , Vaishaal Shankar , Yinfei Yang , Jonathon Shlens , Alexander Toshev

Deep Multimodal Learning with Missing Modality: A Survey

During multimodal model training and testing, certain data modalities may be absent due to sensor limitations, cost constraints, privacy concerns, or data loss, negatively affecting performance. Multimodal learning techniques designed to…

Computer Vision and Pattern Recognition · Computer Science 2026-02-05 Renjie Wu , Hu Wang , Hsiang-Ting Chen , Gustavo Carneiro

Diversified Ensemble of Independent Sub-Networks for Robust Self-Supervised Representation Learning

Ensembling a neural network is a widely recognized approach to enhance model performance, estimate uncertainty, and improve robustness in deep supervised learning. However, deep ensembles often come with high computational costs and memory…

Machine Learning · Statistics 2023-09-04 Amirhossein Vahidi , Lisa Wimmer , Hüseyin Anil Gündüz , Bernd Bischl , Eyke Hüllermeier , Mina Rezaei

Multimodal Representation Learning and Fusion

Multi-modal learning is a fast growing area in artificial intelligence. It tries to help machines understand complex things by combining information from different sources, like images, text, and audio. By using the strengths of each…

Machine Learning · Computer Science 2025-12-22 Qihang Jin , Enze Ge , Yuhang Xie , Hongying Luo , Junhao Song , Ziqian Bi , Chia Xin Liang , Jibin Guan , Joe Yeong , Xinyuan Song , Junfeng Hao

Dynamic Modality Scheduling for Multimodal Large Models via Confidence, Uncertainty, and Semantic Consistency

Multimodal Large Models (MLLMs) have achieved remarkable progress in vision-language understanding and generation tasks. However, existing MLLMs typically rely on static modality fusion strategies, which treat all modalities equally…

Computer Vision and Pattern Recognition · Computer Science 2025-06-17 Hiroshi Tanaka , Anika Rao , Hana Satou , Michael Johnson , Sofia García

Multimodal Learning for MIMO Beam Prediction Based on Variational Inference

Accurate beam prediction is essential for mitigating signalling overhead and latency in integrated sensing and communication-enabled massive multi-input multi-output systems. With the aid of multimodal learning, the prediction accuracy can…

Signal Processing · Electrical Eng. & Systems 2026-05-15 Zijian Zheng , Wenqiang Yi , Hyundong Shin , Arumugam Nallanathan

DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning

Multimodal learning integrates complementary information from diverse modalities to enhance the decision-making process. However, the potential of multimodal collaboration remains under-exploited due to disparities in data quality and…

Computer Vision and Pattern Recognition · Computer Science 2025-10-29 Chengxuan Qian , Kai Han , Jiaxin Liu , Zhenlong Yuan , Zhengzhong Zhu , Jingchao Wang , Chongwen Lyu , Jun Chen , Zhe Liu

Recent Advances and Trends in Multimodal Deep Learning: A Review

Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning is to create models that can process and link information using various modalities. Despite…

Computer Vision and Pattern Recognition · Computer Science 2021-05-25 Jabeen Summaira , Xi Li , Amin Muhammad Shoib , Songyuan Li , Jabbar Abdul

A Review on Methods and Applications in Multimodal Deep Learning

Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning (MMDL) is to create models that can process and link information using various modalities.…

Machine Learning · Computer Science 2022-02-21 Jabeen Summaira , Xi Li , Amin Muhammad Shoib , Jabbar Abdul

Self-Supervised MultiModal Versatile Networks

Videos are a rich source of multi-modal supervision. In this work, we learn representations using self-supervision by leveraging three modalities naturally present in videos: visual, audio and language streams. To this end, we introduce the…

Computer Vision and Pattern Recognition · Computer Science 2020-11-02 Jean-Baptiste Alayrac , Adrià Recasens , Rosalia Schneider , Relja Arandjelović , Jason Ramapuram , Jeffrey De Fauw , Lucas Smaira , Sander Dieleman , Andrew Zisserman

DeepInteraction++: Multi-Modality Interaction for Autonomous Driving

Existing top-performance autonomous driving systems typically rely on the multi-modal fusion strategy for reliable scene understanding. This design is however fundamentally restricted due to overlooking the modality-specific strengths and…

Computer Vision and Pattern Recognition · Computer Science 2025-02-24 Zeyu Yang , Nan Song , Wei Li , Xiatian Zhu , Li Zhang , Philip H. S. Torr