Related papers: Learning Unseen Modality Interaction

Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models

Multimodal learning typically relies on the assumption that all modalities are fully available during both the training and inference phases. However, in real-world scenarios, consistently acquiring complete multimodal data presents…

Computer Vision and Pattern Recognition · Computer Science 2024-07-18 Donggeun Kim , Taesup Kim

A Theory of Multimodal Learning

Human perception of the empirical world involves recognizing the diverse appearances, or 'modalities', of underlying objects. Despite the longstanding consideration of this perspective in philosophy and cognitive science, the study of…

Machine Learning · Computer Science 2023-12-19 Zhou Lu

Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications

In many machine learning systems that jointly learn from multiple modalities, a core research question is to understand the nature of multimodal interactions: how modalities combine to provide new task-relevant information that was not…

Machine Learning · Computer Science 2024-06-14 Paul Pu Liang , Chun Kai Ling , Yun Cheng , Alex Obolenskiy , Yudong Liu , Rohan Pandey , Alex Wilf , Louis-Philippe Morency , Ruslan Salakhutdinov

What Makes Multi-modal Learning Better than Single (Provably)

The world provides us with data of multiple modalities. Intuitively, models fusing data from different modalities outperform their uni-modal counterparts, since more information is aggregated. Recently, joining the success of deep learning,…

Machine Learning · Computer Science 2021-10-27 Yu Huang , Chenzhuang Du , Zihui Xue , Xuanyao Chen , Hang Zhao , Longbo Huang

Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions

Multimodal deep learning systems which employ multiple modalities like text, image, audio, video, etc., are showing better performance in comparison with individual modalities (i.e., unimodal) systems. Multimodal machine learning involves…

Machine Learning · Computer Science 2022-01-19 Anil Rahate , Rahee Walambe , Sheela Ramanna , Ketan Kotecha

Multimodal Representation Learning by Alternating Unimodal Adaptation

Multimodal learning, which integrates data from diverse sensory modes, plays a pivotal role in artificial intelligence. However, existing multimodal learning methods often struggle with challenges where some modalities appear more dominant…

Machine Learning · Computer Science 2024-04-02 Xiaohui Zhang , Jaehong Yoon , Mohit Bansal , Huaxiu Yao

Towards Uniformity and Alignment for Multimodal Representation Learning

Multimodal representation learning aims to construct a shared embedding space in which heterogeneous modalities are semantically aligned. Despite strong empirical results, InfoNCE-based objectives introduce inherent conflicts that yield…

Machine Learning · Computer Science 2026-02-11 Wenzhe Yin , Pan Zhou , Zehao Xiao , Jie Liu , Shujian Yu , Jan-Jakob Sonke , Efstratios Gavves

Continual Learning for Multiple Modalities

Continual learning aims to learn knowledge of tasks observed in sequential time steps while mitigating the forgetting of previously learned knowledge. Existing methods were designed to learn a single modality (e.g., image) over time, which…

Computer Vision and Pattern Recognition · Computer Science 2025-08-15 Hyundong Jin , Eunwoo Kim

Towards Modality Generalization: A Benchmark and Prospective Analysis

Multi-modal learning has achieved remarkable success by integrating information from various modalities, achieving superior performance in tasks like recognition and retrieval compared to uni-modal approaches. However, real-world scenarios…

Computer Vision and Pattern Recognition · Computer Science 2025-08-05 Xiaohao Liu , Xiaobo Xia , Zhuo Huang , See-Kiong Ng , Tat-Seng Chua

Balanced Multimodal Learning via Mutual Information

Multimodal learning has increasingly become a focal point in research, primarily due to its ability to integrate complementary information from diverse modalities. Nevertheless, modality imbalance, stemming from factors such as insufficient…

Machine Learning · Computer Science 2025-11-04 Rongrong Xie , Guido Sanguinetti

Multimodal Representation Learning and Fusion

Multi-modal learning is a fast growing area in artificial intelligence. It tries to help machines understand complex things by combining information from different sources, like images, text, and audio. By using the strengths of each…

Machine Learning · Computer Science 2025-12-22 Qihang Jin , Enze Ge , Yuhang Xie , Hongying Luo , Junhao Song , Ziqian Bi , Chia Xin Liang , Jibin Guan , Joe Yeong , Xinyuan Song , Junfeng Hao

OmniBind: Teach to Build Unequal-Scale Modality Interaction for Omni-Bind of All

Research on multi-modal learning dominantly aligns the modalities in a unified space at training, and only a single one is taken for prediction at inference. However, for a real machine, e.g., a robot, sensors could be added or removed at…

Computer Vision and Pattern Recognition · Computer Science 2024-05-28 Yuanhuiyi Lyu , Xu Zheng , Dahun Kim , Lin Wang

Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models

Traditional multimodal learners find unified representations for tasks like visual question answering, but rely heavily on paired datasets. However, an overlooked yet potentially powerful question is: can one leverage auxiliary unpaired…

Machine Learning · Computer Science 2025-10-10 Sharut Gupta , Shobhita Sundaram , Chenyu Wang , Stefanie Jegelka , Phillip Isola

Multimodal Guidance Network for Missing-Modality Inference in Content Moderation

Multimodal deep learning, especially vision-language models, have gained significant traction in recent years, greatly improving performance on many downstream tasks, including content moderation and violence detection. However, standard…

Computer Vision and Pattern Recognition · Computer Science 2024-08-05 Zhuokai Zhao , Harish Palani , Tianyi Liu , Lena Evans , Ruth Toner

Large-scale multimodal models have shown excellent performance over a series of tasks powered by the large corpus of paired multimodal training data. Generally, they are always assumed to receive modality-complete inputs. However, this…

Computer Vision and Pattern Recognition · Computer Science 2024-10-22 Lianyu Hu , Tongkai Shi , Wei Feng , Fanhua Shang , Liang Wan

UniMoCo: Unified Modality Completion for Robust Multi-Modal Embeddings

Current vision-language models have been explored for multi-modal embedding tasks like information retrieval. However, they face significant challenges in real-world queries and targets involving diverse modality combinations, as existing…

Computer Vision and Pattern Recognition · Computer Science 2026-05-07 Jiajun Qin , Yuan Pu , Zhuolun He , Seunggeun Kim , David Z. Pan , Bei Yu

Multimodal sparse representation learning and applications

Unsupervised methods have proven effective for discriminative tasks in a single-modality scenario. In this paper, we present a multimodal framework for learning sparse representations that can capture semantic correlation between…

Machine Learning · Computer Science 2016-03-03 Miriam Cha , Youngjune Gwon , H. T. Kung

Multimodal Understanding Through Correlation Maximization and Minimization

Multimodal learning has mainly focused on learning large models on, and fusing feature representations from, different modalities for better performances on downstream tasks. In this work, we take a detour from this trend and study the…

Computer Vision and Pattern Recognition · Computer Science 2023-05-08 Yifeng Shi , Marc Niethammer

Balanced Multimodal Learning: An Unidirectional Dynamic Interaction Perspective

Multimodal learning typically utilizes multimodal joint loss to integrate different modalities and enhance model performance. However, this joint learning strategy can induce modality imbalance, where strong modalities overwhelm weaker ones…

Machine Learning · Computer Science 2025-09-08 Shijie Wang , Li Zhang , Xinyan Liang , Yuhua Qian , Shen Hu

MMP: Towards Robust Multi-Modal Learning with Masked Modality Projection

Multimodal learning seeks to combine data from multiple input sources to enhance the performance of different downstream tasks. In real-world scenarios, performance can degrade substantially if some input modalities are missing. Existing…

Machine Learning · Computer Science 2024-10-10 Niki Nezakati , Md Kaykobad Reza , Ameya Patil , Mashhour Solh , M. Salman Asif