English
Related papers

Related papers: DeepSuM: Deep Sufficient Modality Learning Framewo…

200 papers

Multimodal learning robust to missing modality has attracted increasing attention due to its practicality. Existing methods tend to address it by learning a common subspace representation for different modality combinations. However, we…

Computer Vision and Pattern Recognition · Computer Science 2024-07-08 Shicai Wei , Yang Luo , Yuji Wang , Chunbo Luo

Multimodal deep learning systems which employ multiple modalities like text, image, audio, video, etc., are showing better performance in comparison with individual modalities (i.e., unimodal) systems. Multimodal machine learning involves…

Machine Learning · Computer Science 2022-01-19 Anil Rahate , Rahee Walambe , Sheela Ramanna , Ketan Kotecha

Multimodal learning seeks to combine data from multiple input sources to enhance the performance of different downstream tasks. In real-world scenarios, performance can degrade substantially if some input modalities are missing. Existing…

Machine Learning · Computer Science 2024-10-10 Niki Nezakati , Md Kaykobad Reza , Ameya Patil , Mashhour Solh , M. Salman Asif

Multimodality Representation Learning, as a technique of learning to embed information from different modalities and their correlations, has achieved remarkable success on a variety of applications, such as Visual Question Answering (VQA),…

Artificial Intelligence · Computer Science 2024-03-04 Muhammad Arslan Manzoor , Sarah Albarri , Ziting Xian , Zaiqiao Meng , Preslav Nakov , Shangsong Liang

Multimodal datasets contain an enormous amount of relational information, which grows exponentially with the introduction of new modalities. Learning representations in such a scenario is inherently complex due to the presence of multiple…

Machine Learning · Computer Science 2019-09-24 Devanshu Arya , Stevan Rudinac , Marcel Worring

Many important problems in the real world don't have unique solutions. It is thus important for machine learning models to be capable of proposing different plausible solutions with meaningful probability measures. In this work we introduce…

Machine Learning · Computer Science 2020-07-28 Di Qiu , Lok Ming Lui

Multimodal learning has become a prominent research area, with the potential of substantial performance gains by combining information across modalities. At the same time, model development has trended toward increasingly complex deep…

Machine Learning · Computer Science 2026-05-08 Tillmann Rheude , Roland Eils , Benjamin Wild

Multimodal learning typically relies on the assumption that all modalities are fully available during both the training and inference phases. However, in real-world scenarios, consistently acquiring complete multimodal data presents…

Computer Vision and Pattern Recognition · Computer Science 2024-07-18 Donggeun Kim , Taesup Kim

Combining multiple modalities carrying complementary information through multimodal learning (MML) has shown considerable benefits for diagnosing multiple pathologies. However, the robustness of multimodal models to missing modalities is…

Machine Learning · Computer Science 2024-07-31 Hava Chaptoukaev , Vincenzo Marcianó , Francesco Galati , Maria A. Zuluaga

Multimodal learning is defined as learning over multiple heterogeneous input modalities such as video, audio, and text. In this work, we are concerned with understanding how models behave as the type of modalities differ between training…

Machine Learning · Computer Science 2023-04-12 Brandon McKinzie , Joseph Cheng , Vaishaal Shankar , Yinfei Yang , Jonathon Shlens , Alexander Toshev

During multimodal model training and testing, certain data modalities may be absent due to sensor limitations, cost constraints, privacy concerns, or data loss, negatively affecting performance. Multimodal learning techniques designed to…

Computer Vision and Pattern Recognition · Computer Science 2026-02-05 Renjie Wu , Hu Wang , Hsiang-Ting Chen , Gustavo Carneiro

Ensembling a neural network is a widely recognized approach to enhance model performance, estimate uncertainty, and improve robustness in deep supervised learning. However, deep ensembles often come with high computational costs and memory…

Multi-modal learning is a fast growing area in artificial intelligence. It tries to help machines understand complex things by combining information from different sources, like images, text, and audio. By using the strengths of each…

Machine Learning · Computer Science 2025-12-22 Qihang Jin , Enze Ge , Yuhang Xie , Hongying Luo , Junhao Song , Ziqian Bi , Chia Xin Liang , Jibin Guan , Joe Yeong , Xinyuan Song , Junfeng Hao

Multimodal Large Models (MLLMs) have achieved remarkable progress in vision-language understanding and generation tasks. However, existing MLLMs typically rely on static modality fusion strategies, which treat all modalities equally…

Computer Vision and Pattern Recognition · Computer Science 2025-06-17 Hiroshi Tanaka , Anika Rao , Hana Satou , Michael Johnson , Sofia García

Accurate beam prediction is essential for mitigating signalling overhead and latency in integrated sensing and communication-enabled massive multi-input multi-output systems. With the aid of multimodal learning, the prediction accuracy can…

Signal Processing · Electrical Eng. & Systems 2026-05-15 Zijian Zheng , Wenqiang Yi , Hyundong Shin , Arumugam Nallanathan

Multimodal learning integrates complementary information from diverse modalities to enhance the decision-making process. However, the potential of multimodal collaboration remains under-exploited due to disparities in data quality and…

Computer Vision and Pattern Recognition · Computer Science 2025-10-29 Chengxuan Qian , Kai Han , Jiaxin Liu , Zhenlong Yuan , Zhengzhong Zhu , Jingchao Wang , Chongwen Lyu , Jun Chen , Zhe Liu

Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning is to create models that can process and link information using various modalities. Despite…

Computer Vision and Pattern Recognition · Computer Science 2021-05-25 Jabeen Summaira , Xi Li , Amin Muhammad Shoib , Songyuan Li , Jabbar Abdul

Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning (MMDL) is to create models that can process and link information using various modalities.…

Machine Learning · Computer Science 2022-02-21 Jabeen Summaira , Xi Li , Amin Muhammad Shoib , Jabbar Abdul

Videos are a rich source of multi-modal supervision. In this work, we learn representations using self-supervision by leveraging three modalities naturally present in videos: visual, audio and language streams. To this end, we introduce the…

Computer Vision and Pattern Recognition · Computer Science 2020-11-02 Jean-Baptiste Alayrac , Adrià Recasens , Rosalia Schneider , Relja Arandjelović , Jason Ramapuram , Jeffrey De Fauw , Lucas Smaira , Sander Dieleman , Andrew Zisserman

Existing top-performance autonomous driving systems typically rely on the multi-modal fusion strategy for reliable scene understanding. This design is however fundamentally restricted due to overlooking the modality-specific strengths and…

Computer Vision and Pattern Recognition · Computer Science 2025-02-24 Zeyu Yang , Nan Song , Wei Li , Xiatian Zhu , Li Zhang , Philip H. S. Torr
‹ Prev 1 2 3 10 Next ›