Related papers: Improving Multimodal Learning with Multi-Loss Grad…
Multimodal learning integrates information from different modalities to enhance model performance, yet it often suffers from modality imbalance, where dominant modalities overshadow weaker ones during joint optimization. This paper reveals…
Different modalities hold considerable gaps in optimization trajectories, including speeds and paths, which lead to modality laziness and modality clash when jointly training multimodal models, resulting in insufficient and imbalanced…
Multimodal learning has developed very fast in recent years. However, during the multimodal training process, the model tends to rely on only one modality based on which it could learn faster, thus leading to inadequate use of other…
Learning multi-modal representations is an essential step towards real-world robotic applications, and various multi-modal fusion models have been developed for this purpose. However, we observe that existing models, whose objectives are…
Multimodal learning helps to comprehensively understand the world, by integrating different senses. Accordingly, multiple input modalities are expected to boost model performance, but we actually find that they are not fully exploited even…
To overcome the imbalanced multimodal learning problem, where models prefer the training of specific modalities, existing methods propose to control the training of uni-modal encoders from different perspectives, taking the inter-modal…
Multimodal Federated Learning frequently encounters challenges of client modality heterogeneity, leading to undesired performances for secondary modality in multimodal learning. It is particularly prevalent in audiovisual learning, with…
The cross-modal retrieval model leverages the potential of triple loss optimization to learn robust embedding spaces. However, existing methods often train these models in a singular pass, overlooking the distinction between semi-hard and…
Multimodal learning integrates diverse modalities but suffers from modality imbalance, where dominant modalities suppress weaker ones due to inconsistent convergence rates. Existing methods predominantly rely on static modulation or…
Audio-visual learning helps to comprehensively understand the world by fusing practical information from multiple modalities. However, recent studies show that the imbalanced optimization of uni-modal encoders in a joint-learning model is a…
Multimodal fusion is susceptible to modality imbalance, where dominant modalities overshadow weak ones, easily leading to biased learning and suboptimal fusion, especially for incomplete modality conditions. To address this problem, we…
Multimodal learning often encounters the under-optimized problem and may have worse performance than unimodal learning. Existing methods attribute this problem to the imbalanced learning between modalities and rebalance them through…
Multimodal learning, which integrates data from diverse sensory modes, plays a pivotal role in artificial intelligence. However, existing multimodal learning methods often struggle with challenges where some modalities appear more dominant…
Multimodal learning is defined as learning over multiple heterogeneous input modalities such as video, audio, and text. In this work, we are concerned with understanding how models behave as the type of modalities differ between training…
Broadcast and media organizations increasingly rely on artificial intelligence to automate the labor-intensive processes of content indexing, tagging, and metadata generation. However, existing AI systems typically operate on a single…
The strength of multimodal learning lies in its ability to integrate information from various sources, providing rich and comprehensive insights. However, in real-world scenarios, multi-modal systems often face the challenge of dynamic…
Multimodal learning faces two major challenges: modality imbalance and data noise, which significantly affect the robustness and generalization ability of models. Existing methods achieve modality balance by suppressing dominant modalities,…
Multimodal learning has significantly enhanced machine learning performance but still faces numerous challenges and limitations. Imbalanced multimodal learning is one of the problems extensively studied in recent works and is typically…
Multimodal networks have demonstrated remarkable performance improvements over their unimodal counterparts. Existing multimodal networks are designed in a multi-branch fashion that, due to the reliance on fusion strategies, exhibit…
This paper investigates how to better leverage large-scale pre-trained uni-modal models to further enhance discriminative multi-modal learning. Even when fine-tuned with only uni-modal data, these models can outperform previous multi-modal…