Related papers: Supervised cross-modal factor analysis for multipl…

VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification

Multimodal learning from document data has achieved great success lately as it allows to pre-train semantically meaningful features as a prior into a learnable downstream task. In this paper, we approach the document classification problem…

Computer Vision and Pattern Recognition · Computer Science 2023-05-12 Souhail Bakkali , Zuheng Ming , Mickael Coustaty , Marçal Rusiñol , Oriol Ramos Terrades

Cross-Modal Retrieval and Synthesis (X-MRS): Closing the Modality Gap in Shared Representation Learning

Computational food analysis (CFA) naturally requires multi-modal evidence of a particular food, e.g., images, recipe text, etc. A key to making CFA possible is multi-modal shared representation learning, which aims to create a joint…

Computer Vision and Pattern Recognition · Computer Science 2021-10-01 Ricardo Guerrero , Hai Xuan Pham , Vladimir Pavlovic

Multimodal Subspace Support Vector Data Description

In this paper, we propose a novel method for projecting data from multiple modalities to a new subspace optimized for one-class classification. The proposed method iteratively transforms the data from the original feature space of each…

Machine Learning · Computer Science 2020-09-15 Fahad Sohrab , Jenni Raitoharju , Alexandros Iosifidis , Moncef Gabbouj

Multi-modal Semantic Understanding with Contrastive Cross-modal Feature Alignment

Multi-modal semantic understanding requires integrating information from different modalities to extract users' real intention behind words. Most previous work applies a dual-encoder structure to separately encode image and text, but fails…

Computation and Language · Computer Science 2024-03-12 Ming Zhang , Ke Chang , Yunfang Wu

Factor Analysis with Correlated Topic Model for Multi-Modal Data

Integrating various data modalities brings valuable insights into underlying phenomena. Multimodal factor analysis (FA) uncovers shared axes of variation underlying different simple data modalities, where each sample is represented by a…

Machine Learning · Computer Science 2025-04-29 Małgorzata Łazęcka , Ewa Szczurek

Learning Factorized Multimodal Representations

Learning multimodal representations is a fundamentally complex research problem due to the presence of multiple heterogeneous sources of information. Although the presence of multiple modalities provides additional valuable information,…

Machine Learning · Computer Science 2019-05-15 Yao-Hung Hubert Tsai , Paul Pu Liang , Amir Zadeh , Louis-Philippe Morency , Ruslan Salakhutdinov

MutualFormer: Multi-Modality Representation Learning via Cross-Diffusion Attention

Aggregating multi-modality data to obtain reliable data representation attracts more and more attention. Recent studies demonstrate that Transformer models usually work well for multi-modality tasks. Existing Transformers generally either…

Computer Vision and Pattern Recognition · Computer Science 2023-03-17 Xixi Wang , Xiao Wang , Bo Jiang , Jin Tang , Bin Luo

Adaptive Cross-Modal Few-Shot Learning

Metric-based meta-learning techniques have successfully been applied to few-shot classification problems. In this paper, we propose to leverage cross-modal information to enhance metric-based few-shot learning methods. Visual and semantic…

Machine Learning · Computer Science 2020-02-19 Chen Xing , Negar Rostamzadeh , Boris N. Oreshkin , Pedro O. Pinheiro

Semi-supervised multi-view concept decomposition

Concept Factorization (CF), as a novel paradigm of representation learning, has demonstrated superior performance in multi-view clustering tasks. It overcomes limitations such as the non-negativity constraint imposed by traditional matrix…

Machine Learning · Computer Science 2023-07-04 Qi Jiang , Guoxu Zhou , Qibin Zhao

Two-Stream Video Classification with Cross-Modality Attention

Fusing multi-modality information is known to be able to effectively bring significant improvement in video classification. However, the most popular method up to now is still simply fusing each stream's prediction scores at the last stage.…

Computer Vision and Pattern Recognition · Computer Science 2019-08-02 Lu Chi , Guiyu Tian , Yadong Mu , Qi Tian

CMA-CLIP: Cross-Modality Attention CLIP for Image-Text Classification

Modern Web systems such as social media and e-commerce contain rich contents expressed in images and text. Leveraging information from multi-modalities can improve the performance of machine learning tasks such as classification and…

Computer Vision and Pattern Recognition · Computer Science 2021-12-10 Huidong Liu , Shaoyuan Xu , Jinmiao Fu , Yang Liu , Ning Xie , Chien-Chih Wang , Bryan Wang , Yi Sun

Self-supervised Modal and View Invariant Feature Learning

Most of the existing self-supervised feature learning methods for 3D data either learn 3D features from point cloud data or from multi-view images. By exploring the inherent multi-modality attributes of 3D objects, in this paper, we propose…

Computer Vision and Pattern Recognition · Computer Science 2020-05-29 Longlong Jing , Yucheng Chen , Ling Zhang , Mingyi He , Yingli Tian

Semi-supervised Feature Analysis by Mining Correlations among Multiple Tasks

In this paper, we propose a novel semi-supervised feature selection framework by mining correlations among multiple tasks and apply it to different multimedia applications. Instead of independently computing the importance of features for…

Machine Learning · Computer Science 2017-07-11 Xiaojun Chang , Yi Yang

Self-supervised Feature Learning by Cross-modality and Cross-view Correspondences

The success of supervised learning requires large-scale ground truth labels which are very expensive, time-consuming, or may need special skills to annotate. To address this issue, many self- or un-supervised methods are developed. Unlike…

Computer Vision and Pattern Recognition · Computer Science 2020-04-14 Longlong Jing , Yucheng Chen , Ling Zhang , Mingyi He , Yingli Tian

Multi-Modal Image Fusion via Intervention-Stable Feature Learning

Multi-modal image fusion integrates complementary information from different modalities into a unified representation. Current methods predominantly optimize statistical correlations between modalities, often capturing dataset-induced…

Computer Vision and Pattern Recognition · Computer Science 2026-03-25 Xue Wang , Zheng Guan , Wenhua Qian , Chengchao Wang , Runzhuo Ma

Multimodal Sentiment Analysis Based on Causal Reasoning

With the rapid development of multimedia, the shift from unimodal textual sentiment analysis to multimodal image-text sentiment analysis has obtained academic and industrial attention in recent years. However, multimodal sentiment analysis…

Multimedia · Computer Science 2024-12-11 Fuhai Chen , Pengpeng Huang , Xuri Ge , Jie Huang , Zishuo Bao

Deep Multi-View Learning via Task-Optimal CCA

Canonical Correlation Analysis (CCA) is widely used for multimodal data analysis and, more recently, for discriminative tasks such as multi-view learning; however, it makes no use of class labels. Recent CCA methods have started to address…

Machine Learning · Computer Science 2019-07-19 Heather D. Couture , Roland Kwitt , J. S. Marron , Melissa Troester , Charles M. Perou , Marc Niethammer

CSA: Data-efficient Mapping of Unimodal Features to Multimodal Features

Multimodal encoders like CLIP excel in tasks such as zero-shot image classification and cross-modal retrieval. However, they require excessive training data. We propose canonical similarity analysis (CSA), which uses two unimodal encoders…

Machine Learning · Computer Science 2025-03-17 Po-han Li , Sandeep P. Chinchali , Ufuk Topcu

Semantic-Space-Intervened Diffusive Alignment for Visual Classification

Cross-modal alignment is an effective approach to improving visual classification. Existing studies typically enforce a one-step mapping that uses deep neural networks to project the visual features to mimic the distribution of textual…

Computer Vision and Pattern Recognition · Computer Science 2025-05-27 Zixuan Li , Lei Meng , Guoqing Chao , Wei Wu , Xiaoshuo Yan , Yimeng Yang , Zhuang Qi , Xiangxu Meng

Improving the Modality Representation with Multi-View Contrastive Learning for Multimodal Sentiment Analysis

Modality representation learning is an important problem for multimodal sentiment analysis (MSA), since the highly distinguishable representations can contribute to improving the analysis effect. Previous works of MSA have usually focused…

Multimedia · Computer Science 2023-01-31 Peipei Liu , Xin Zheng , Hong Li , Jie Liu , Yimo Ren , Hongsong Zhu , Limin Sun