Related papers: Using Multiple Instance Learning to Build Multimod…

A Mathematical Perspective On Contrastive Learning

Multimodal contrastive learning is a methodology for linking different data modalities; the canonical example is linking image and text data. The methodology is typically framed as the identification of a set of encoders, one for each…

Machine Learning · Statistics 2025-06-02 Ricardo Baptista , Andrew M. Stuart , Son Tran

Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning

We perform a comprehensive benchmarking of contrastive frameworks for learning multimodal representations in the medical domain. Through this study, we aim to answer the following research questions: (i) How transferable are general-domain…

Computer Vision and Pattern Recognition · Computer Science 2024-06-12 Shuvendu Roy , Yasaman Parhizkar , Franklin Ogidi , Vahid Reza Khazaie , Michael Colacci , Ali Etemad , Elham Dolatabadi , Arash Afkanpour

Multimodal Contrastive Training for Visual Representation Learning

We develop an approach to learning visual representations that embraces multimodal data, driven by a combination of intra- and inter-modal similarity preservation objectives. Unlike existing visual pre-training methods, which solve a proxy…

Computer Vision and Pattern Recognition · Computer Science 2021-04-28 Xin Yuan , Zhe Lin , Jason Kuen , Jianming Zhang , Yilin Wang , Michael Maire , Ajinkya Kale , Baldo Faieta

VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification

Multimodal learning from document data has achieved great success lately as it allows to pre-train semantically meaningful features as a prior into a learnable downstream task. In this paper, we approach the document classification problem…

Computer Vision and Pattern Recognition · Computer Science 2023-05-12 Souhail Bakkali , Zuheng Ming , Mickael Coustaty , Marçal Rusiñol , Oriol Ramos Terrades

Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications

Multimodality Representation Learning, as a technique of learning to embed information from different modalities and their correlations, has achieved remarkable success on a variety of applications, such as Visual Question Answering (VQA),…

Artificial Intelligence · Computer Science 2024-03-04 Muhammad Arslan Manzoor , Sarah Albarri , Ziting Xian , Zaiqiao Meng , Preslav Nakov , Shangsong Liang

Multi-Modal Continual Learning via Cross-Modality Adapters and Representation Alignment with Knowledge Preservation

Continual learning is essential for adapting models to new tasks while retaining previously acquired knowledge. While existing approaches predominantly focus on uni-modal data, multi-modal learning offers substantial benefits by utilizing…

Machine Learning · Computer Science 2025-11-11 Evelyn Chee , Wynne Hsu , Mong Li Lee

Relating by Contrasting: A Data-efficient Framework for Multimodal Generative Models

Multimodal learning for generative models often refers to the learning of abstract concepts from the commonality of information in multiple modalities, such as vision and language. While it has proven effective for learning generalisable…

Machine Learning · Computer Science 2021-04-22 Yuge Shi , Brooks Paige , Philip H. S. Torr , N. Siddharth

On the Value of Cross-Modal Misalignment in Multimodal Representation Learning

Multimodal representation learning, exemplified by multimodal contrastive learning (MMCL) using image-text pairs, aims to learn powerful representations by aligning cues across modalities. This approach relies on the core assumption that…

Machine Learning · Computer Science 2025-09-29 Yichao Cai , Yuhang Liu , Erdun Gao , Tianjiao Jiang , Zhen Zhang , Anton van den Hengel , Javen Qinfeng Shi

Improving the Modality Representation with Multi-View Contrastive Learning for Multimodal Sentiment Analysis

Modality representation learning is an important problem for multimodal sentiment analysis (MSA), since the highly distinguishable representations can contribute to improving the analysis effect. Previous works of MSA have usually focused…

Multimedia · Computer Science 2023-01-31 Peipei Liu , Xin Zheng , Hong Li , Jie Liu , Yimo Ren , Hongsong Zhu , Limin Sun

Multimodal Representation Learning Conditioned on Semantic Relations

Multimodal representation learning has been largely driven by contrastive models such as CLIP, which learn a shared embedding space by aligning paired image-text samples. While effective for general-purpose representation learning, such…

Machine Learning · Computer Science 2026-05-12 Yang Qiao , Yuntong Hu , Bowen Zhu , Hasibul Haque , Liang Zhao

Multimodal Representation Learning and Fusion

Multi-modal learning is a fast growing area in artificial intelligence. It tries to help machines understand complex things by combining information from different sources, like images, text, and audio. By using the strengths of each…

Machine Learning · Computer Science 2025-12-22 Qihang Jin , Enze Ge , Yuhang Xie , Hongying Luo , Junhao Song , Ziqian Bi , Chia Xin Liang , Jibin Guan , Joe Yeong , Xinyuan Song , Junfeng Hao

Multimodal Representation Learning With Text and Images

In recent years, multimodal AI has seen an upward trend as researchers are integrating data of different types such as text, images, speech into modelling to get the best results. This project leverages multimodal AI and matrix…

Machine Learning · Computer Science 2022-05-03 Aishwarya Jayagopal , Ankireddy Monica Aiswarya , Ankita Garg , Srinivasan Kolumam Nandakumar

Robust Multimodal Representation Learning in Healthcare

Medical multimodal representation learning aims to integrate heterogeneous data into unified patient representations to support clinical outcome prediction. However, real-world medical datasets commonly contain systematic biases from…

Machine Learning · Computer Science 2026-05-19 Xiaoguang Zhu , Linxiao Gong , Lianlong Sun , Yang Liu , Haoyu Wang , Jing Liu

Image Pivoting for Learning Multilingual Multimodal Representations

In this paper we propose a model to learn multimodal multilingual representations for matching images and sentences in different languages, with the aim of advancing multilingual versions of image search and image understanding. Our model…

Computation and Language · Computer Science 2017-07-25 Spandana Gella , Rico Sennrich , Frank Keller , Mirella Lapata

Multimodal self-supervised learning for lesion localization

Multimodal deep learning utilizing imaging and diagnostic reports has made impressive progress in the field of medical imaging diagnostics, demonstrating a particularly strong capability for auxiliary diagnosis in cases where sufficient…

Computer Vision and Pattern Recognition · Computer Science 2024-08-21 Hao Yang , Hong-Yu Zhou , Cheng Li , Weijian Huang , Jiarun Liu , Yong Liang , Guangming Shi , Hairong Zheng , Qiegen Liu , Shanshan Wang

Multimodal Understanding Through Correlation Maximization and Minimization

Multimodal learning has mainly focused on learning large models on, and fusing feature representations from, different modalities for better performances on downstream tasks. In this work, we take a detour from this trend and study the…

Computer Vision and Pattern Recognition · Computer Science 2023-05-08 Yifeng Shi , Marc Niethammer

A Framework for Learning Invariant Physical Relations in Multimodal Sensory Processing

Perceptual learning enables humans to recognize and represent stimuli invariant to various transformations and build a consistent representation of the self and physical world. Such representations preserve the invariant physical relations…

Neural and Evolutionary Computing · Computer Science 2020-07-02 Du Xiaorui , Yavuzhan Erdem , Immanuel Schweizer , Cristian Axenie

Multi-Label Image Classification with Contrastive Learning

Recently, as an effective way of learning latent representations, contrastive learning has been increasingly popular and successful in various domains. The success of constrastive learning in single-label classifications motivates us to…

Computer Vision and Pattern Recognition · Computer Science 2021-07-27 Son D. Dao , Ethan Zhao , Dinh Phung , Jianfei Cai

Multi-modal contrastive learning adapts to intrinsic dimensions of shared latent variables

Multi-modal contrastive learning as a self-supervised representation learning technique has achieved great success in foundation model training, such as CLIP~\citep{radford2021learning}. In this paper, we study the theoretical properties of…

Machine Learning · Statistics 2025-05-20 Yu Gui , Cong Ma , Zongming Ma

Understanding the Emergence of Multimodal Representation Alignment

Multimodal representation learning is fundamentally about transforming incomparable modalities into comparable representations. While prior research primarily focused on explicitly aligning these representations through targeted learning…

Machine Learning · Computer Science 2025-06-16 Megan Tjandrasuwita , Chanakya Ekbote , Liu Ziyin , Paul Pu Liang