Related papers: Boosting Multi-modal Model Performance with Adapti…

Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably)

Despite the remarkable success of deep multi-modal learning in practice, it has not been well-explained in theory. Recently, it has been observed that the best uni-modal network outperforms the jointly trained multi-modal network, which is…

Machine Learning · Computer Science 2022-03-24 Yu Huang , Junyang Lin , Chang Zhou , Hongxia Yang , Longbo Huang

Balanced Multimodal Learning via On-the-fly Gradient Modulation

Multimodal learning helps to comprehensively understand the world, by integrating different senses. Accordingly, multiple input modalities are expected to boost model performance, but we actually find that they are not fully exploited even…

Computer Vision and Pattern Recognition · Computer Science 2022-03-30 Xiaokang Peng , Yake Wei , Andong Deng , Dong Wang , Di Hu

On-the-fly Modulation for Balanced Multimodal Learning

Multimodal learning is expected to boost model performance by integrating information from different modalities. However, its potential is not fully exploited because the widely-used joint training strategy, which has a uniform objective…

Computer Vision and Pattern Recognition · Computer Science 2024-10-16 Yake Wei , Di Hu , Henghui Du , Ji-Rong Wen

Classifier-guided Gradient Modulation for Enhanced Multimodal Learning

Multimodal learning has developed very fast in recent years. However, during the multimodal training process, the model tends to rely on only one modality based on which it could learn faster, thus leading to inadequate use of other…

Machine Learning · Computer Science 2024-11-05 Zirun Guo , Tao Jin , Jingyuan Chen , Zhou Zhao

Modality Equilibrium Matters: Minor-Modality-Aware Adaptive Alternating for Cross-Modal Memory Enhancement

Multimodal fusion is susceptible to modality imbalance, where dominant modalities overshadow weak ones, easily leading to biased learning and suboptimal fusion, especially for incomplete modality conditions. To address this problem, we…

Machine Learning · Computer Science 2026-03-20 Xiang Shi , Rui Zhang , Jiawei Liu , Yinpeng Liu , Qikai Cheng , Wei Lu

Modular and Parameter-Efficient Multimodal Fusion with Prompting

Recent research has made impressive progress in large-scale multimodal pre-training. In the context of the rapid growth of model size, it is necessary to seek efficient and flexible methods other than finetuning. In this paper, we propose…

Computation and Language · Computer Science 2022-03-16 Sheng Liang , Mengjie Zhao , Hinrich Schütze

Gradient-Guided Modality Decoupling for Missing-Modality Robustness

Multimodal learning with incomplete input data (missing modality) is practical and challenging. In this work, we conduct an in-depth analysis of this challenge and find that modality dominance has a significant negative impact on the model…

Computer Vision and Pattern Recognition · Computer Science 2024-02-27 Hao Wang , Shengda Luo , Guosheng Hu , Jianguo Zhang

Towards Good Practices for Missing Modality Robust Action Recognition

Standard multi-modal models assume the use of the same modalities in training and inference stages. However, in practice, the environment in which multi-modal models operate may not satisfy such assumption. As such, their performances…

Computer Vision and Pattern Recognition · Computer Science 2023-03-31 Sangmin Woo , Sumin Lee , Yeonju Park , Muhammad Adi Nugroho , Changick Kim

Robust Multimodal Learning with Missing Modalities via Parameter-Efficient Adaptation

Multimodal learning seeks to utilize data from multiple sources to improve the overall performance of downstream tasks. It is desirable for redundancies in the data to make multimodal systems robust to missing or corrupted observations in…

Computer Vision and Pattern Recognition · Computer Science 2024-10-14 Md Kaykobad Reza , Ashley Prater-Bennette , M. Salman Asif

PMR: Prototypical Modal Rebalance for Multimodal Learning

Multimodal learning (MML) aims to jointly exploit the common priors of different modalities to compensate for their inherent limitations. However, existing MML methods often optimize a uniform objective for different modalities, leading to…

Machine Learning · Computer Science 2022-11-15 Yunfeng Fan , Wenchao Xu , Haozhao Wang , Junxiao Wang , Song Guo

Improving Multi-Modal Learning with Uni-Modal Teachers

Learning multi-modal representations is an essential step towards real-world robotic applications, and various multi-modal fusion models have been developed for this purpose. However, we observe that existing models, whose objectives are…

Machine Learning · Computer Science 2021-06-22 Chenzhuang Du , Tingle Li , Yichen Liu , Zixin Wen , Tianyu Hua , Yue Wang , Hang Zhao

Quantifying and Enhancing Multi-modal Robustness with Modality Preference

Multi-modal models have shown a promising capability to effectively integrate information from various sources, yet meanwhile, they are found vulnerable to pervasive perturbations, such as uni-modal attacks and missing conditions. To…

Computer Vision and Pattern Recognition · Computer Science 2024-04-19 Zequn Yang , Yake Wei , Ce Liang , Di Hu

AIM: Adaptive Intra-Network Modulation for Balanced Multimodal Learning

Multimodal learning has significantly enhanced machine learning performance but still faces numerous challenges and limitations. Imbalanced multimodal learning is one of the problems extensively studied in recent works and is typically…

Computer Vision and Pattern Recognition · Computer Science 2025-11-04 Shu Shen , C. L. Philip Chen , Tong Zhang

What Makes Training Multi-Modal Classification Networks Hard?

Consider end-to-end training of a multi-modal vs. a single-modal network on a task with multiple input modalities: the multi-modal network receives more information, so it should match or outperform its single-modal counterpart. In our…

Computer Vision and Pattern Recognition · Computer Science 2020-04-06 Weiyao Wang , Du Tran , Matt Feiszli

Boosting Multimodal Learning via Disentangled Gradient Learning

Multimodal learning often encounters the under-optimized problem and may have worse performance than unimodal learning. Existing methods attribute this problem to the imbalanced learning between modalities and rebalance them through…

Computer Vision and Pattern Recognition · Computer Science 2025-07-15 Shicai Wei , Chunbo Luo , Yang Luo

ReconBoost: Boosting Can Achieve Modality Reconcilement

This paper explores a novel multi-modal alternating learning paradigm pursuing a reconciliation between the exploitation of uni-modal features and the exploration of cross-modal interactions. This is motivated by the fact that current…

Computer Vision and Pattern Recognition · Computer Science 2024-05-16 Cong Hua , Qianqian Xu , Shilong Bao , Zhiyong Yang , Qingming Huang

MDE: Modality Discrimination Enhancement for Multi-modal Recommendation

Multi-modal recommendation systems aim to enhance performance by integrating an item's content features across various modalities with user behavior data. Effective utilization of features from different modalities requires addressing two…

Information Retrieval · Computer Science 2025-02-27 Hang Zhou , Yucheng Wang , Huijing Zhan

Revisit Modality Imbalance at the Decision Layer

Multimodal learning integrates information from different modalities to enhance model performance, yet it often suffers from modality imbalance, where dominant modalities overshadow weaker ones during joint optimization. This paper reveals…

Machine Learning · Computer Science 2025-10-17 Xiaoyu Ma , Hao Chen

Rethinking Multimodal Learning from the Perspective of Mitigating Classification Ability Disproportion

Multimodal learning (MML) is significantly constrained by modality imbalance, leading to suboptimal performance in practice. While existing approaches primarily focus on balancing the learning of different modalities to address this issue,…

Computer Vision and Pattern Recognition · Computer Science 2026-01-30 QingYuan Jiang , Longfei Huang , Yang Yang

Learning to Rebalance Multi-Modal Optimization by Adaptively Masking Subnetworks

Multi-modal learning aims to enhance performance by unifying models from various modalities but often faces the "modality imbalance" problem in real data, leading to a bias towards dominant modalities and neglecting others, thereby limiting…

Computer Vision and Pattern Recognition · Computer Science 2024-04-15 Yang Yang , Hongpeng Pan , Qing-Yuan Jiang , Yi Xu , Jinghui Tang