Multi-Modality Collaborative Learning for Sentiment Analysis

Shanmin Wang; Chengguang Liu; Qingshan Liu

Multi-Modality Collaborative Learning for Sentiment Analysis

Machine Learning 2025-12-19 v2 Artificial Intelligence Information Retrieval

Authors: Shanmin Wang , Chengguang Liu , Qingshan Liu

Abstract

Multimodal sentiment analysis (MSA) identifies individuals' sentiment states in videos by integrating visual, audio, and text modalities. Despite progress in existing methods, the inherent modality heterogeneity limits the effective capture of interactive sentiment features across modalities. In this paper, by introducing a Multi-Modality Collaborative Learning (MMCL) framework, we facilitate cross-modal interactions and capture enhanced and complementary features from modality-common and modality-specific representations, respectively. Specifically, we design a parameter-free decoupling module and separate uni-modality into modality-common and modality-specific components through semantics assessment of cross-modal elements. For modality-specific representations, inspired by the act-reward mechanism in reinforcement learning, we design policy models to adaptively mine complementary sentiment features under the guidance of a joint reward. For modality-common representations, intra-modal attention is employed to highlight crucial components, playing enhanced roles among modalities. Experimental results, including superiority evaluations on four databases, effectiveness verification of each module, and assessment of complementary features, demonstrate that MMCL successfully learns collaborative features across modalities and significantly improves performance. The code can be available at https://github.com/smwanghhh/MMCL.

Keywords

multimodal emotion recognition multimodal learning affective computing

Cite

@article{arxiv.2501.12424,
  title  = {Multi-Modality Collaborative Learning for Sentiment Analysis},
  author = {Shanmin Wang and Chengguang Liu and Qingshan Liu},
  journal= {arXiv preprint arXiv:2501.12424},
  year   = {2025}
}

Comments

The method has flaws, especially with the decoupling module. During the decoupling process, the heterogeneity of the three modal data and the differences in distribution were not taken into account

Multi-Modality Collaborative Learning for Sentiment Analysis

Abstract

Keywords

Cite

Comments

Related papers