English
Related papers

Related papers: Learning Unseen Modality Interaction

200 papers

Multimodal learning typically relies on the assumption that all modalities are fully available during both the training and inference phases. However, in real-world scenarios, consistently acquiring complete multimodal data presents…

Computer Vision and Pattern Recognition · Computer Science 2024-07-18 Donggeun Kim , Taesup Kim

Human perception of the empirical world involves recognizing the diverse appearances, or 'modalities', of underlying objects. Despite the longstanding consideration of this perspective in philosophy and cognitive science, the study of…

Machine Learning · Computer Science 2023-12-19 Zhou Lu

In many machine learning systems that jointly learn from multiple modalities, a core research question is to understand the nature of multimodal interactions: how modalities combine to provide new task-relevant information that was not…

The world provides us with data of multiple modalities. Intuitively, models fusing data from different modalities outperform their uni-modal counterparts, since more information is aggregated. Recently, joining the success of deep learning,…

Machine Learning · Computer Science 2021-10-27 Yu Huang , Chenzhuang Du , Zihui Xue , Xuanyao Chen , Hang Zhao , Longbo Huang

Multimodal deep learning systems which employ multiple modalities like text, image, audio, video, etc., are showing better performance in comparison with individual modalities (i.e., unimodal) systems. Multimodal machine learning involves…

Machine Learning · Computer Science 2022-01-19 Anil Rahate , Rahee Walambe , Sheela Ramanna , Ketan Kotecha

Multimodal learning, which integrates data from diverse sensory modes, plays a pivotal role in artificial intelligence. However, existing multimodal learning methods often struggle with challenges where some modalities appear more dominant…

Machine Learning · Computer Science 2024-04-02 Xiaohui Zhang , Jaehong Yoon , Mohit Bansal , Huaxiu Yao

Multimodal representation learning aims to construct a shared embedding space in which heterogeneous modalities are semantically aligned. Despite strong empirical results, InfoNCE-based objectives introduce inherent conflicts that yield…

Machine Learning · Computer Science 2026-02-11 Wenzhe Yin , Pan Zhou , Zehao Xiao , Jie Liu , Shujian Yu , Jan-Jakob Sonke , Efstratios Gavves

Continual learning aims to learn knowledge of tasks observed in sequential time steps while mitigating the forgetting of previously learned knowledge. Existing methods were designed to learn a single modality (e.g., image) over time, which…

Computer Vision and Pattern Recognition · Computer Science 2025-08-15 Hyundong Jin , Eunwoo Kim

Multi-modal learning has achieved remarkable success by integrating information from various modalities, achieving superior performance in tasks like recognition and retrieval compared to uni-modal approaches. However, real-world scenarios…

Computer Vision and Pattern Recognition · Computer Science 2025-08-05 Xiaohao Liu , Xiaobo Xia , Zhuo Huang , See-Kiong Ng , Tat-Seng Chua

Multimodal learning has increasingly become a focal point in research, primarily due to its ability to integrate complementary information from diverse modalities. Nevertheless, modality imbalance, stemming from factors such as insufficient…

Machine Learning · Computer Science 2025-11-04 Rongrong Xie , Guido Sanguinetti

Multi-modal learning is a fast growing area in artificial intelligence. It tries to help machines understand complex things by combining information from different sources, like images, text, and audio. By using the strengths of each…

Machine Learning · Computer Science 2025-12-22 Qihang Jin , Enze Ge , Yuhang Xie , Hongying Luo , Junhao Song , Ziqian Bi , Chia Xin Liang , Jibin Guan , Joe Yeong , Xinyuan Song , Junfeng Hao

Research on multi-modal learning dominantly aligns the modalities in a unified space at training, and only a single one is taken for prediction at inference. However, for a real machine, e.g., a robot, sensors could be added or removed at…

Computer Vision and Pattern Recognition · Computer Science 2024-05-28 Yuanhuiyi Lyu , Xu Zheng , Dahun Kim , Lin Wang

Traditional multimodal learners find unified representations for tasks like visual question answering, but rely heavily on paired datasets. However, an overlooked yet potentially powerful question is: can one leverage auxiliary unpaired…

Machine Learning · Computer Science 2025-10-10 Sharut Gupta , Shobhita Sundaram , Chenyu Wang , Stefanie Jegelka , Phillip Isola

Multimodal deep learning, especially vision-language models, have gained significant traction in recent years, greatly improving performance on many downstream tasks, including content moderation and violence detection. However, standard…

Computer Vision and Pattern Recognition · Computer Science 2024-08-05 Zhuokai Zhao , Harish Palani , Tianyi Liu , Lena Evans , Ruth Toner

Large-scale multimodal models have shown excellent performance over a series of tasks powered by the large corpus of paired multimodal training data. Generally, they are always assumed to receive modality-complete inputs. However, this…

Computer Vision and Pattern Recognition · Computer Science 2024-10-22 Lianyu Hu , Tongkai Shi , Wei Feng , Fanhua Shang , Liang Wan

Current vision-language models have been explored for multi-modal embedding tasks like information retrieval. However, they face significant challenges in real-world queries and targets involving diverse modality combinations, as existing…

Computer Vision and Pattern Recognition · Computer Science 2026-05-07 Jiajun Qin , Yuan Pu , Zhuolun He , Seunggeun Kim , David Z. Pan , Bei Yu

Unsupervised methods have proven effective for discriminative tasks in a single-modality scenario. In this paper, we present a multimodal framework for learning sparse representations that can capture semantic correlation between…

Machine Learning · Computer Science 2016-03-03 Miriam Cha , Youngjune Gwon , H. T. Kung

Multimodal learning has mainly focused on learning large models on, and fusing feature representations from, different modalities for better performances on downstream tasks. In this work, we take a detour from this trend and study the…

Computer Vision and Pattern Recognition · Computer Science 2023-05-08 Yifeng Shi , Marc Niethammer

Multimodal learning typically utilizes multimodal joint loss to integrate different modalities and enhance model performance. However, this joint learning strategy can induce modality imbalance, where strong modalities overwhelm weaker ones…

Machine Learning · Computer Science 2025-09-08 Shijie Wang , Li Zhang , Xinyan Liang , Yuhua Qian , Shen Hu

Multimodal learning seeks to combine data from multiple input sources to enhance the performance of different downstream tasks. In real-world scenarios, performance can degrade substantially if some input modalities are missing. Existing…

Machine Learning · Computer Science 2024-10-10 Niki Nezakati , Md Kaykobad Reza , Ameya Patil , Mashhour Solh , M. Salman Asif
‹ Prev 1 2 3 10 Next ›