Related papers: Learning Multi-Modal Nonlinear Embeddings: Perform…

Nonlinear Supervised Dimensionality Reduction via Smooth Regular Embeddings

The recovery of the intrinsic geometric structures of data collections is an important problem in data analysis. Supervised extensions of several manifold learning approaches have been proposed in the recent years. Meanwhile, existing…

Computer Vision and Pattern Recognition · Computer Science 2018-05-29 Cem Ornek , Elif Vural

A study of the classification of low-dimensional data with supervised manifold learning

Supervised manifold learning methods learn data representations by preserving the geometric structure of data while enhancing the separation between data samples from different classes. In this work, we propose a theoretical study of…

Machine Learning · Computer Science 2018-01-08 Elif Vural , Christine Guillemot

Simple to Complex Cross-modal Learning to Rank

The heterogeneity-gap between different modalities brings a significant challenge to multimedia information retrieval. Some studies formalize the cross-modal retrieval tasks as a ranking problem and learn a shared multi-modal embedding…

Machine Learning · Computer Science 2017-07-11 Minnan Luo , Xiaojun Chang , Zhihui Li , Liqiang Nie , Alexander G. Hauptmann , Qinghua Zheng

Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models

Multimodal learning typically relies on the assumption that all modalities are fully available during both the training and inference phases. However, in real-world scenarios, consistently acquiring complete multimodal data presents…

Computer Vision and Pattern Recognition · Computer Science 2024-07-18 Donggeun Kim , Taesup Kim

Understanding the Emergence of Multimodal Representation Alignment

Multimodal representation learning is fundamentally about transforming incomparable modalities into comparable representations. While prior research primarily focused on explicitly aligning these representations through targeted learning…

Machine Learning · Computer Science 2025-06-16 Megan Tjandrasuwita , Chanakya Ekbote , Liu Ziyin , Paul Pu Liang

Multimodal Intelligence: Representation Learning, Information Fusion, and Applications

Deep learning methods have revolutionized speech recognition, image recognition, and natural language processing since 2010. Each of these tasks involves a single modality in their input signals. However, many applications in the artificial…

Artificial Intelligence · Computer Science 2020-07-15 Chao Zhang , Zichao Yang , Xiaodong He , Li Deng

Unsupervised Multimodal Language Representations using Convolutional Autoencoders

Multimodal Language Analysis is a demanding area of research, since it is associated with two requirements: combining different modalities and capturing temporal information. During the last years, several works have been proposed in the…

Computation and Language · Computer Science 2022-01-10 Panagiotis Koromilas , Theodoros Giannakopoulos

Learning Unseen Modality Interaction

Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences. In this paper, we challenge this modality-complete assumption for multimodal learning and instead strive…

Computer Vision and Pattern Recognition · Computer Science 2023-10-26 Yunhua Zhang , Hazel Doughty , Cees G. M. Snoek

Multimodal Understanding Through Correlation Maximization and Minimization

Multimodal learning has mainly focused on learning large models on, and fusing feature representations from, different modalities for better performances on downstream tasks. In this work, we take a detour from this trend and study the…

Computer Vision and Pattern Recognition · Computer Science 2023-05-08 Yifeng Shi , Marc Niethammer

A Theory of Multimodal Learning

Human perception of the empirical world involves recognizing the diverse appearances, or 'modalities', of underlying objects. Despite the longstanding consideration of this perspective in philosophy and cognitive science, the study of…

Machine Learning · Computer Science 2023-12-19 Zhou Lu

Efficient Optimization Methods for Extreme Similarity Learning with Nonlinear Embeddings

We study the problem of learning similarity by using nonlinear embedding models (e.g., neural networks) from all possible pairs. This problem is well-known for its difficulty of training with the extreme number of pairs. For the special…

Machine Learning · Statistics 2021-06-16 Bowen Yuan , Yu-Sheng Li , Pengrui Quan , Chih-Jen Lin

Quantifying Multimodal Capabilities: Formal Generalization Guarantees in Pairwise Metric Learning

Multimodal learning leverages the integration of diverse data modalities to enhance performance in complex tasks. Yet, it frequently encounters incomplete or redundant modality data in real-world scenarios. This paper presents a…

Machine Learning · Computer Science 2026-05-05 Richeng Zhou , Xuelin Zhang , Liyuan Liu

Unsupervised Multimodal Representation Learning across Medical Images and Reports

Joint embeddings between medical imaging modalities and associated radiology reports have the potential to offer significant benefits to the clinical community, ranging from cross-domain retrieval to conditional generation of reports to the…

Machine Learning · Computer Science 2018-11-28 Tzu-Ming Harry Hsu , Wei-Hung Weng , Willie Boag , Matthew McDermott , Peter Szolovits

Continual learning in cross-modal retrieval

Multimodal representations and continual learning are two areas closely related to human intelligence. The former considers the learning of shared representation spaces where information from different modalities can be compared and…

Computer Vision and Pattern Recognition · Computer Science 2021-04-20 Kai Wang , Luis Herranz , Joost van de Weijer

Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications

Multimodality Representation Learning, as a technique of learning to embed information from different modalities and their correlations, has achieved remarkable success on a variety of applications, such as Visual Question Answering (VQA),…

Artificial Intelligence · Computer Science 2024-03-04 Muhammad Arslan Manzoor , Sarah Albarri , Ziting Xian , Zaiqiao Meng , Preslav Nakov , Shangsong Liang

Deep Multi-Modal Sets

Many vision-related tasks benefit from reasoning over multiple modalities to leverage complementary views of data in an attempt to learn robust embedding spaces. Most deep learning-based methods rely on a late fusion technique whereby…

Computer Vision and Pattern Recognition · Computer Science 2020-03-04 Austin Reiter , Menglin Jia , Pu Yang , Ser-Nam Lim

What Makes Training Multi-Modal Classification Networks Hard?

Consider end-to-end training of a multi-modal vs. a single-modal network on a task with multiple input modalities: the multi-modal network receives more information, so it should match or outperform its single-modal counterpart. In our…

Computer Vision and Pattern Recognition · Computer Science 2020-04-06 Weiyao Wang , Du Tran , Matt Feiszli

Remarks on Lipschitz-Minimal Interpolation: Generalization Bounds and Neural Network Implementation

This note establishes a theoretical framework for finding (potentially overparameterized) approximations of a function on a compact set with a-priori bounds for the generalization error. The approximation method considered is to choose,…

Systems and Control · Electrical Eng. & Systems 2026-03-23 Arthur C. B. de Oliveira , Ruigang Wang , Ian R. Manchester , Eduardo D. Sontag

Layer-Specific Lipschitz Modulation for Fault-Tolerant Multimodal Representation Learning

Modern multimodal systems deployed in industrial and safety-critical environments must remain reliable under partial sensor failures, signal degradation, or cross-modal inconsistencies. This work introduces a mathematically grounded…

Machine Learning · Computer Science 2026-03-27 Diyar Altinses , Andreas Schwung

Multi-Modal Continual Learning via Cross-Modality Adapters and Representation Alignment with Knowledge Preservation

Continual learning is essential for adapting models to new tasks while retaining previously acquired knowledge. While existing approaches predominantly focus on uni-modal data, multi-modal learning offers substantial benefits by utilizing…

Machine Learning · Computer Science 2025-11-11 Evelyn Chee , Wynne Hsu , Mong Li Lee