English
Related papers

Related papers: SMA: Submodular Modality Aligner For Data Efficien…

200 papers

Multimodal models have demonstrated powerful capabilities in complex tasks requiring multimodal alignment, including zero-shot classification and cross-modal retrieval. However, existing models typically rely on millions of paired…

Computer Vision and Pattern Recognition · Computer Science 2025-10-23 Fabian Gröger , Shuo Wen , Huyen Le , Maria Brbić

Multimodal learning, which integrates data from diverse sensory modes, plays a pivotal role in artificial intelligence. However, existing multimodal learning methods often struggle with challenges where some modalities appear more dominant…

Machine Learning · Computer Science 2024-04-02 Xiaohui Zhang , Jaehong Yoon , Mohit Bansal , Huaxiu Yao

Multimodal large language models (MLLMs) achieve strong performance by jointly processing inputs from multiple modalities, such as vision, audio, and language. However, building such models or extending them to new modalities often requires…

Machine Learning · Computer Science 2026-03-24 Md Kaykobad Reza , Ameya Patil , Edward Ayrapetian , M. Salman Asif

Multimodal learning, which aims to understand and analyze information from multiple modalities, has achieved substantial progress in the supervised regime in recent years. However, the heavy dependence on data paired with expensive human…

Machine Learning · Computer Science 2024-08-19 Yongshuo Zong , Oisin Mac Aodha , Timothy Hospedales

Multimodal learning leverages complementary information derived from different modalities, thereby enhancing performance in medical image segmentation. However, prevailing multimodal learning methods heavily rely on extensive well-annotated…

Computer Vision and Pattern Recognition · Computer Science 2024-09-05 Xiaogen Zhou , Yiyou Sun , Min Deng , Winnie Chiu Wing Chu , Qi Dou

In this paper, we aim at tackling a general but interesting cross-modality feature learning question in remote sensing community --- can a limited amount of highly-discrimin-ative (e.g., hyperspectral) training data improve the performance…

Computer Vision and Pattern Recognition · Computer Science 2019-12-19 Danfeng Hong , Naoto Yokoya , Nan Ge , Jocelyn Chanussot , Xiao Xiang Zhu

In this work, we address the critical yet underexplored challenge of symmetric multimodal-to-multimodal (MM2MM) retrieval, where queries and contexts are interchangeable. Existing universal multimodal retrieval works struggle with this…

Computer Vision and Pattern Recognition · Computer Science 2026-05-18 Wenjie Yang , Hang Yu , Yuyu Guo , Peng Di

Collaborative game-based learning environments offer rich opportunities for small-group knowledge construction, yet automatically predicting student collaboration satisfaction remains challenging. A critical barrier is modality degradation:…

Machine Learning · Computer Science 2026-05-19 Wen-Hsin Tsai , Chia-Ming Lee , Yuk-Ying Tung

Multi-modal learning aims to enhance performance by unifying models from various modalities but often faces the "modality imbalance" problem in real data, leading to a bias towards dominant modalities and neglecting others, thereby limiting…

Computer Vision and Pattern Recognition · Computer Science 2024-04-15 Yang Yang , Hongpeng Pan , Qing-Yuan Jiang , Yi Xu , Jinghui Tang

In this paper, we propose SimMLM, a simple yet powerful framework for multimodal learning with missing modalities. Unlike existing approaches that rely on sophisticated network architectures or complex data imputation techniques, SimMLM…

Computer Vision and Pattern Recognition · Computer Science 2025-08-07 Sijie Li , Chen Chen , Jungong Han

Confusion and forgetting of object classes have been challenges of prime interest in Few-Shot Object Detection (FSOD). To overcome these pitfalls in metric learning based FSOD techniques, we introduce a novel Submodular Mutual Information…

Computer Vision and Pattern Recognition · Computer Science 2024-09-18 Anay Majee , Ryan Sharp , Rishabh Iyer

Image matching for both cross-view and cross-modality plays a critical role in multimodal perception. In practice, the modality gap caused by different imaging systems/styles poses great challenges to the matching task. Existing works try…

Computer Vision and Pattern Recognition · Computer Science 2025-04-01 Jiangwei Ren , Xingyu Jiang , Zizhuo Li , Dingkang Liang , Xin Zhou , Xiang Bai

Combining multiple modalities carrying complementary information through multimodal learning (MML) has shown considerable benefits for diagnosing multiple pathologies. However, the robustness of multimodal models to missing modalities is…

Machine Learning · Computer Science 2024-07-31 Hava Chaptoukaev , Vincenzo Marcianó , Francesco Galati , Maria A. Zuluaga

The ability to quickly learn a new task with minimal instruction - known as few-shot learning - is a central aspect of intelligent agents. Classical few-shot benchmarks make use of few-shot samples from a single modality, but such samples…

Computer Vision and Pattern Recognition · Computer Science 2024-08-29 Zhiqiu Lin , Samuel Yu , Zhiyi Kuang , Deepak Pathak , Deva Ramanan

Multi-modal data fusion has recently been shown promise in classification tasks in remote sensing. Optical data and radar data, two important yet intrinsically different data sources, are attracting more and more attention for potential…

Computer Vision and Pattern Recognition · Computer Science 2020-01-08 Jingliang Hu , Danfeng Hong , Xiao Xiang Zhu

Multimodal semantic segmentation integrates complementary information from diverse sensors for remote sensing Earth observation. However, practical systems often encounter missing modalities due to sensor failures or incomplete coverage,…

Computer Vision and Pattern Recognition · Computer Science 2026-03-04 Lekang Wen , Liang Liao , Jing Xiao , Mi Wang

Multimodal learning integrates diverse modalities but suffers from modality imbalance, where dominant modalities suppress weaker ones due to inconsistent convergence rates. Existing methods predominantly rely on static modulation or…

Machine Learning · Computer Science 2026-02-11 Zhaocheng Liu , Zhiwen Yu , Xiaoqing Liu

Few-shot image classification remains a critical challenge in the field of computer vision, particularly in data-scarce environments. Existing methods typically rely on pre-trained visual-language models, such as CLIP. However, due to the…

Computer Vision and Pattern Recognition · Computer Science 2026-02-17 Xi Yang , Pai Peng , Wulin Xie , Xiaohuan Lu , Jie Wen

In multimodal learning, dominant modalities often overshadow others, limiting generalization. We propose Modality-Aware Sharpness-Aware Minimization (M-SAM), a model-agnostic framework that applies to many modalities and supports early and…

Computer Vision and Pattern Recognition · Computer Science 2025-10-30 Hossein R. Nowdeh , Jie Ji , Xiaolong Ma , Fatemeh Afghah

Utilizing multi-modal data enhances scene understanding by providing complementary semantic and geometric information. Existing methods fuse features or distill knowledge from multiple modalities into a unified representation, improving…

Computer Vision and Pattern Recognition · Computer Science 2025-06-05 Jialei Chen , Xu Zheng , Danda Pani Paudel , Luc Van Gool , Hiroshi Murase , Daisuke Deguchi
‹ Prev 1 2 3 10 Next ›