Related papers: Improving Multimodal Learning with Multi-Loss Grad…

Revisit Modality Imbalance at the Decision Layer

Multimodal learning integrates information from different modalities to enhance model performance, yet it often suffers from modality imbalance, where dominant modalities overshadow weaker ones during joint optimization. This paper reveals…

Machine Learning · Computer Science 2025-10-17 Xiaoyu Ma , Hao Chen

Improving Multimodal Learning Balance and Sufficiency through Data Remixing

Different modalities hold considerable gaps in optimization trajectories, including speeds and paths, which lead to modality laziness and modality clash when jointly training multimodal models, resulting in insufficient and imbalanced…

Machine Learning · Computer Science 2025-06-17 Xiaoyu Ma , Hao Chen , Yongjian Deng

Classifier-guided Gradient Modulation for Enhanced Multimodal Learning

Multimodal learning has developed very fast in recent years. However, during the multimodal training process, the model tends to rely on only one modality based on which it could learn faster, thus leading to inadequate use of other…

Machine Learning · Computer Science 2024-11-05 Zirun Guo , Tao Jin , Jingyuan Chen , Zhou Zhao

Improving Multi-Modal Learning with Uni-Modal Teachers

Learning multi-modal representations is an essential step towards real-world robotic applications, and various multi-modal fusion models have been developed for this purpose. However, we observe that existing models, whose objectives are…

Machine Learning · Computer Science 2021-06-22 Chenzhuang Du , Tingle Li , Yichen Liu , Zixin Wen , Tianyu Hua , Yue Wang , Hang Zhao

Balanced Multimodal Learning via On-the-fly Gradient Modulation

Multimodal learning helps to comprehensively understand the world, by integrating different senses. Accordingly, multiple input modalities are expected to boost model performance, but we actually find that they are not fully exploited even…

Computer Vision and Pattern Recognition · Computer Science 2022-03-30 Xiaokang Peng , Yake Wei , Andong Deng , Dong Wang , Di Hu

Diagnosing and Re-learning for Balanced Multimodal Learning

To overcome the imbalanced multimodal learning problem, where models prefer the training of specific modalities, existing methods propose to control the training of uni-modal encoders from different perspectives, taking the inter-modal…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Yake Wei , Siwei Li , Ruoxuan Feng , Di Hu

ModalityMirror: Improving Audio Classification in Modality Heterogeneity Federated Learning with Multimodal Distillation

Multimodal Federated Learning frequently encounters challenges of client modality heterogeneity, leading to undesired performances for secondary modality in multimodal learning. It is particularly prevalent in audiovisual learning, with…

Audio and Speech Processing · Electrical Eng. & Systems 2024-08-29 Tiantian Feng , Tuo Zhang , Salman Avestimehr , Shrikanth S. Narayanan

Two-Stage Triplet Loss Training with Curriculum Augmentation for Audio-Visual Retrieval

The cross-modal retrieval model leverages the potential of triple loss optimization to learn robust embedding spaces. However, existing methods often train these models in a singular pass, overlooking the distinction between semi-hard and…

Sound · Computer Science 2023-10-23 Donghuo Zeng , Kazushi Ikeda

Quantifying Multimodal Imbalance: A GMM-Guided Adaptive Loss for Audio-Visual Learning

Multimodal learning integrates diverse modalities but suffers from modality imbalance, where dominant modalities suppress weaker ones due to inconsistent convergence rates. Existing methods predominantly rely on static modulation or…

Machine Learning · Computer Science 2026-02-11 Zhaocheng Liu , Zhiwen Yu , Xiaoqing Liu

MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning

Audio-visual learning helps to comprehensively understand the world by fusing practical information from multiple modalities. However, recent studies show that the imbalanced optimization of uni-modal encoders in a joint-learning model is a…

Sound · Computer Science 2023-03-14 Ruize Xu , Ruoxuan Feng , Shi-Xiong Zhang , Di Hu

Modality Equilibrium Matters: Minor-Modality-Aware Adaptive Alternating for Cross-Modal Memory Enhancement

Multimodal fusion is susceptible to modality imbalance, where dominant modalities overshadow weak ones, easily leading to biased learning and suboptimal fusion, especially for incomplete modality conditions. To address this problem, we…

Machine Learning · Computer Science 2026-03-20 Xiang Shi , Rui Zhang , Jiawei Liu , Yinpeng Liu , Qikai Cheng , Wei Lu

Boosting Multimodal Learning via Disentangled Gradient Learning

Multimodal learning often encounters the under-optimized problem and may have worse performance than unimodal learning. Existing methods attribute this problem to the imbalanced learning between modalities and rebalance them through…

Computer Vision and Pattern Recognition · Computer Science 2025-07-15 Shicai Wei , Chunbo Luo , Yang Luo

Multimodal Representation Learning by Alternating Unimodal Adaptation

Multimodal learning, which integrates data from diverse sensory modes, plays a pivotal role in artificial intelligence. However, existing multimodal learning methods often struggle with challenges where some modalities appear more dominant…

Machine Learning · Computer Science 2024-04-02 Xiaohui Zhang , Jaehong Yoon , Mohit Bansal , Huaxiu Yao

On Robustness in Multimodal Learning

Multimodal learning is defined as learning over multiple heterogeneous input modalities such as video, audio, and text. In this work, we are concerned with understanding how models behave as the type of modalities differ between training…

Machine Learning · Computer Science 2023-04-12 Brandon McKinzie , Joseph Cheng , Vaishaal Shankar , Yinfei Yang , Jonathon Shlens , Alexander Toshev

Reconstruction-Driven Multimodal Representation Learning for Automated Media Understanding

Broadcast and media organizations increasingly rely on artificial intelligence to automate the labor-intensive processes of content indexing, tagging, and metadata generation. However, existing AI systems typically operate on a single…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Yassir Benhammou , Suman Kalyan , Sujay Kumar

Asymmetric Reinforcing against Multi-modal Representation Bias

The strength of multimodal learning lies in its ability to integrate information from various sources, providing rich and comprehensive insights. However, in real-world scenarios, multi-modal systems often face the challenge of dynamic…

Computer Vision and Pattern Recognition · Computer Science 2025-01-03 Xiyuan Gao , Bing Cao , Pengfei Zhu , Nannan Wang , Qinghua Hu

Contribution-Guided Asymmetric Learning for Robust Multimodal Fusion under Imbalance and Noise

Multimodal learning faces two major challenges: modality imbalance and data noise, which significantly affect the robustness and generalization ability of models. Existing methods achieve modality balance by suppressing dominant modalities,…

Multimedia · Computer Science 2025-11-17 Zijing Xu , Yunfeng Kou , Kunming Wu , Hong Liu

AIM: Adaptive Intra-Network Modulation for Balanced Multimodal Learning

Multimodal learning has significantly enhanced machine learning performance but still faces numerous challenges and limitations. Imbalanced multimodal learning is one of the problems extensively studied in recent works and is typically…

Computer Vision and Pattern Recognition · Computer Science 2025-11-04 Shu Shen , C. L. Philip Chen , Tong Zhang

Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach

Multimodal networks have demonstrated remarkable performance improvements over their unimodal counterparts. Existing multimodal networks are designed in a multi-branch fashion that, due to the reliance on fusion strategies, exhibit…

Computer Vision and Pattern Recognition · Computer Science 2024-08-15 Muhammad Saad Saeed , Shah Nawaz , Muhammad Zaigham Zaheer , Muhammad Haris Khan , Karthik Nandakumar , Muhammad Haroon Yousaf , Hassan Sajjad , Tom De Schepper , Markus Schedl

Improving Discriminative Multi-Modal Learning with Large-Scale Pre-Trained Models

This paper investigates how to better leverage large-scale pre-trained uni-modal models to further enhance discriminative multi-modal learning. Even when fine-tuned with only uni-modal data, these models can outperform previous multi-modal…

Computer Vision and Pattern Recognition · Computer Science 2023-10-10 Chenzhuang Du , Yue Zhao , Chonghua Liao , Jiacheng You , Jie Fu , Hang Zhao