English
Related papers

Related papers: Classifier-guided Gradient Modulation for Enhanced…

200 papers

Multimodal learning helps to comprehensively understand the world, by integrating different senses. Accordingly, multiple input modalities are expected to boost model performance, but we actually find that they are not fully exploited even…

Computer Vision and Pattern Recognition · Computer Science 2022-03-30 Xiaokang Peng , Yake Wei , Andong Deng , Dong Wang , Di Hu

Multimodal learning with incomplete input data (missing modality) is practical and challenging. In this work, we conduct an in-depth analysis of this challenge and find that modality dominance has a significant negative impact on the model…

Computer Vision and Pattern Recognition · Computer Science 2024-02-27 Hao Wang , Shengda Luo , Guosheng Hu , Jianguo Zhang

Multimodal Domain Generalization (MMDG) leverages the complementary strengths of multiple modalities to enhance model generalization on unseen domains. A central challenge in multimodal learning is optimization imbalance, where modalities…

Machine Learning · Computer Science 2026-03-17 Hongzhao Li , Guohao Shen , Shupan Li , Mingliang Xu , Muhammad Haris Khan

Multi-Modal Learning (MML) integrates information from diverse modalities to improve predictive accuracy. While existing optimization strategies have made significant strides by mitigating gradient direction conflicts, we revisit MML from a…

Machine Learning · Computer Science 2026-02-09 Peizheng Guo , Jingyao Wang , Wenwen Qiang , Jiahuan Zhou , Changwen Zheng , Gang Hua

Multimodal learning often encounters the under-optimized problem and may have worse performance than unimodal learning. Existing methods attribute this problem to the imbalanced learning between modalities and rebalance them through…

Computer Vision and Pattern Recognition · Computer Science 2025-07-15 Shicai Wei , Chunbo Luo , Yang Luo

Multimodal learning has attracted increasing attention due to its practicality. However, it often suffers from insufficient optimization, where the multimodal model underperforms even compared to its unimodal counterparts. Existing methods…

Computer Vision and Pattern Recognition · Computer Science 2026-04-08 Shicai Wei , Chunbo Luo , Qiang Zhu , Yang Luo

This paper presents a novel neural network training approach for faster convergence and better generalization abilities in deep reinforcement learning. Particularly, we focus on the enhancement of training and evaluation performance in…

Machine Learning · Computer Science 2020-05-26 Mohammed Sharafath Abdul Hameed , Gavneet Singh Chadha , Andreas Schwung , Steven X. Ding

Multimodal learning is expected to boost model performance by integrating information from different modalities. However, its potential is not fully exploited because the widely-used joint training strategy, which has a uniform objective…

Computer Vision and Pattern Recognition · Computer Science 2024-10-16 Yake Wei , Di Hu , Henghui Du , Ji-Rong Wen

Training multimodal networks requires a vast amount of data due to their larger parameter space compared to unimodal networks. Active learning is a widely used technique for reducing data annotation costs by selecting only those samples…

Multimedia · Computer Science 2023-08-22 Meng Shen , Yizheng Huang , Jianxiong Yin , Heqing Zou , Deepu Rajan , Simon See

Learning from multiple modalities, such as audio and video, offers opportunities for leveraging complementary information, enhancing robustness, and improving contextual understanding and performance. However, combining such modalities…

Multimedia · Computer Science 2024-10-15 Konstantinos Kontras , Christos Chatzichristos , Matthew Blaschko , Maarten De Vos

While the field of multi-modal learning keeps growing fast, the deficiency of the standard joint training paradigm has become clear through recent studies. They attribute the sub-optimal performance of the jointly trained model to the…

Computer Vision and Pattern Recognition · Computer Science 2023-08-16 Hong Li , Xingyu Li , Pengbo Hu , Yinuo Lei , Chunxiao Li , Yi Zhou

This paper explores a novel multi-modal alternating learning paradigm pursuing a reconciliation between the exploitation of uni-modal features and the exploration of cross-modal interactions. This is motivated by the fact that current…

Computer Vision and Pattern Recognition · Computer Science 2024-05-16 Cong Hua , Qianqian Xu , Shilong Bao , Zhiyong Yang , Qingming Huang

To overcome the imbalanced multimodal learning problem, where models prefer the training of specific modalities, existing methods propose to control the training of uni-modal encoders from different perspectives, taking the inter-modal…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Yake Wei , Siwei Li , Ruoxuan Feng , Di Hu

Gradient-based meta-learners such as MAML are able to learn a meta-prior from similar tasks to adapt to novel tasks from the same distribution with few gradient updates. One important limitation of such frameworks is that they seek a common…

Machine Learning · Computer Science 2018-12-19 Risto Vuorio , Shao-Hua Sun , Hexiang Hu , Joseph J. Lim

Multimodal learning integrates diverse modalities but suffers from modality imbalance, where dominant modalities suppress weaker ones due to inconsistent convergence rates. Existing methods predominantly rely on static modulation or…

Machine Learning · Computer Science 2026-02-11 Zhaocheng Liu , Zhiwen Yu , Xiaoqing Liu

Multimodal learning (MML) is significantly constrained by modality imbalance, leading to suboptimal performance in practice. While existing approaches primarily focus on balancing the learning of different modalities to address this issue,…

Computer Vision and Pattern Recognition · Computer Science 2026-01-30 QingYuan Jiang , Longfei Huang , Yang Yang

Multimodal learning aims to leverage information from diverse data modalities to achieve more comprehensive performance. However, conventional multimodal models often suffer from modality imbalance, where one or a few modalities dominate…

Computer Vision and Pattern Recognition · Computer Science 2025-10-21 Mohammed Rakib , Arunkumar Bagavathi

Meta-learning offers a principled framework leveraging \emph{task-invariant} priors from related tasks, with which \emph{task-specific} models can be fine-tuned on downstream tasks, even with limited data records. Gradient-based…

Machine Learning · Computer Science 2026-04-16 Yilang Zhang , Abraham Jaeger Mountain , Bingcong Li , Georgios B. Giannakis

Fusing data from multiple modalities provides more information to train machine learning systems. However, it is prohibitively expensive and time-consuming to label each modality with a large amount of data, which leads to a crucial problem…

Computer Vision and Pattern Recognition · Computer Science 2020-07-15 Xinwei Sun , Yilun Xu , Peng Cao , Yuqing Kong , Lingjing Hu , Shanghang Zhang , Yizhou Wang

Instruction tuning in multimodal large language models (MLLMs) generally involves cooperative learning between a backbone LLM and a feature encoder of non-text input modalities. The major challenge is how to efficiently find the synergy…

Machine Learning · Computer Science 2025-09-10 Xintong Li , Junda Wu , Tong Yu , Yu Wang , Xiang Chen , Jiuxiang Gu , Lina Yao , Julian McAuley , Jingbo Shang
‹ Prev 1 2 3 10 Next ›