English
Related papers

Related papers: Improving Multimodal Learning with Multi-Loss Grad…

200 papers

Multimodal learning integrates information from different modalities to enhance model performance, yet it often suffers from modality imbalance, where dominant modalities overshadow weaker ones during joint optimization. This paper reveals…

Machine Learning · Computer Science 2025-10-17 Xiaoyu Ma , Hao Chen

Different modalities hold considerable gaps in optimization trajectories, including speeds and paths, which lead to modality laziness and modality clash when jointly training multimodal models, resulting in insufficient and imbalanced…

Machine Learning · Computer Science 2025-06-17 Xiaoyu Ma , Hao Chen , Yongjian Deng

Multimodal learning has developed very fast in recent years. However, during the multimodal training process, the model tends to rely on only one modality based on which it could learn faster, thus leading to inadequate use of other…

Machine Learning · Computer Science 2024-11-05 Zirun Guo , Tao Jin , Jingyuan Chen , Zhou Zhao

Learning multi-modal representations is an essential step towards real-world robotic applications, and various multi-modal fusion models have been developed for this purpose. However, we observe that existing models, whose objectives are…

Machine Learning · Computer Science 2021-06-22 Chenzhuang Du , Tingle Li , Yichen Liu , Zixin Wen , Tianyu Hua , Yue Wang , Hang Zhao

Multimodal learning helps to comprehensively understand the world, by integrating different senses. Accordingly, multiple input modalities are expected to boost model performance, but we actually find that they are not fully exploited even…

Computer Vision and Pattern Recognition · Computer Science 2022-03-30 Xiaokang Peng , Yake Wei , Andong Deng , Dong Wang , Di Hu

To overcome the imbalanced multimodal learning problem, where models prefer the training of specific modalities, existing methods propose to control the training of uni-modal encoders from different perspectives, taking the inter-modal…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Yake Wei , Siwei Li , Ruoxuan Feng , Di Hu

Multimodal Federated Learning frequently encounters challenges of client modality heterogeneity, leading to undesired performances for secondary modality in multimodal learning. It is particularly prevalent in audiovisual learning, with…

Audio and Speech Processing · Electrical Eng. & Systems 2024-08-29 Tiantian Feng , Tuo Zhang , Salman Avestimehr , Shrikanth S. Narayanan

The cross-modal retrieval model leverages the potential of triple loss optimization to learn robust embedding spaces. However, existing methods often train these models in a singular pass, overlooking the distinction between semi-hard and…

Sound · Computer Science 2023-10-23 Donghuo Zeng , Kazushi Ikeda

Multimodal learning integrates diverse modalities but suffers from modality imbalance, where dominant modalities suppress weaker ones due to inconsistent convergence rates. Existing methods predominantly rely on static modulation or…

Machine Learning · Computer Science 2026-02-11 Zhaocheng Liu , Zhiwen Yu , Xiaoqing Liu

Audio-visual learning helps to comprehensively understand the world by fusing practical information from multiple modalities. However, recent studies show that the imbalanced optimization of uni-modal encoders in a joint-learning model is a…

Sound · Computer Science 2023-03-14 Ruize Xu , Ruoxuan Feng , Shi-Xiong Zhang , Di Hu

Multimodal fusion is susceptible to modality imbalance, where dominant modalities overshadow weak ones, easily leading to biased learning and suboptimal fusion, especially for incomplete modality conditions. To address this problem, we…

Machine Learning · Computer Science 2026-03-20 Xiang Shi , Rui Zhang , Jiawei Liu , Yinpeng Liu , Qikai Cheng , Wei Lu

Multimodal learning often encounters the under-optimized problem and may have worse performance than unimodal learning. Existing methods attribute this problem to the imbalanced learning between modalities and rebalance them through…

Computer Vision and Pattern Recognition · Computer Science 2025-07-15 Shicai Wei , Chunbo Luo , Yang Luo

Multimodal learning, which integrates data from diverse sensory modes, plays a pivotal role in artificial intelligence. However, existing multimodal learning methods often struggle with challenges where some modalities appear more dominant…

Machine Learning · Computer Science 2024-04-02 Xiaohui Zhang , Jaehong Yoon , Mohit Bansal , Huaxiu Yao

Multimodal learning is defined as learning over multiple heterogeneous input modalities such as video, audio, and text. In this work, we are concerned with understanding how models behave as the type of modalities differ between training…

Machine Learning · Computer Science 2023-04-12 Brandon McKinzie , Joseph Cheng , Vaishaal Shankar , Yinfei Yang , Jonathon Shlens , Alexander Toshev

Broadcast and media organizations increasingly rely on artificial intelligence to automate the labor-intensive processes of content indexing, tagging, and metadata generation. However, existing AI systems typically operate on a single…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Yassir Benhammou , Suman Kalyan , Sujay Kumar

The strength of multimodal learning lies in its ability to integrate information from various sources, providing rich and comprehensive insights. However, in real-world scenarios, multi-modal systems often face the challenge of dynamic…

Computer Vision and Pattern Recognition · Computer Science 2025-01-03 Xiyuan Gao , Bing Cao , Pengfei Zhu , Nannan Wang , Qinghua Hu

Multimodal learning faces two major challenges: modality imbalance and data noise, which significantly affect the robustness and generalization ability of models. Existing methods achieve modality balance by suppressing dominant modalities,…

Multimedia · Computer Science 2025-11-17 Zijing Xu , Yunfeng Kou , Kunming Wu , Hong Liu

Multimodal learning has significantly enhanced machine learning performance but still faces numerous challenges and limitations. Imbalanced multimodal learning is one of the problems extensively studied in recent works and is typically…

Computer Vision and Pattern Recognition · Computer Science 2025-11-04 Shu Shen , C. L. Philip Chen , Tong Zhang

Multimodal networks have demonstrated remarkable performance improvements over their unimodal counterparts. Existing multimodal networks are designed in a multi-branch fashion that, due to the reliance on fusion strategies, exhibit…

This paper investigates how to better leverage large-scale pre-trained uni-modal models to further enhance discriminative multi-modal learning. Even when fine-tuned with only uni-modal data, these models can outperform previous multi-modal…

Computer Vision and Pattern Recognition · Computer Science 2023-10-10 Chenzhuang Du , Yue Zhao , Chonghua Liao , Jiacheng You , Jie Fu , Hang Zhao
‹ Prev 1 2 3 10 Next ›