English

Classifier-guided Gradient Modulation for Enhanced Multimodal Learning

Machine Learning 2024-11-05 v1 Computation and Language Computer Vision and Pattern Recognition

Abstract

Multimodal learning has developed very fast in recent years. However, during the multimodal training process, the model tends to rely on only one modality based on which it could learn faster, thus leading to inadequate use of other modalities. Existing methods to balance the training process always have some limitations on the loss functions, optimizers and the number of modalities and only consider modulating the magnitude of the gradients while ignoring the directions of the gradients. To solve these problems, in this paper, we present a novel method to balance multimodal learning with Classifier-Guided Gradient Modulation (CGGM), considering both the magnitude and directions of the gradients. We conduct extensive experiments on four multimodal datasets: UPMC-Food 101, CMU-MOSI, IEMOCAP and BraTS 2021, covering classification, regression and segmentation tasks. The results show that CGGM outperforms all the baselines and other state-of-the-art methods consistently, demonstrating its effectiveness and versatility. Our code is available at https://github.com/zrguo/CGGM.

Keywords

Cite

@article{arxiv.2411.01409,
  title  = {Classifier-guided Gradient Modulation for Enhanced Multimodal Learning},
  author = {Zirun Guo and Tao Jin and Jingyuan Chen and Zhou Zhao},
  journal= {arXiv preprint arXiv:2411.01409},
  year   = {2024}
}

Comments

Accepted at NeurIPS 2024