Related papers: Harmonized Multimodal Learning with Gaussian Proce…

Multimodal LLMs under Pairwise Modalities

Despite the impressive results achieved by multimodal large language models (MLLMs), their training typically relies on jointly curated multimodal data, requiring substantial human effort to construct multi-way aligned datasets and thereby…

Computer Vision and Pattern Recognition · Computer Science 2026-05-21 Yan Li , Yunlong Deng , Yuewen Sun , Gongxu Luo , Kun Zhang , Guangyi Chen

Multimodal Understanding Through Correlation Maximization and Minimization

Multimodal learning has mainly focused on learning large models on, and fusing feature representations from, different modalities for better performances on downstream tasks. In this work, we take a detour from this trend and study the…

Computer Vision and Pattern Recognition · Computer Science 2023-05-08 Yifeng Shi , Marc Niethammer

Cross-Modal Consistency in Multimodal Large Language Models

Recent developments in multimodal methodologies have marked the beginning of an exciting era for models adept at processing diverse data types, encompassing text, audio, and visual content. Models like GPT-4V, which merge computer vision…

Computation and Language · Computer Science 2024-11-15 Xiang Zhang , Senyu Li , Ning Shi , Bradley Hauer , Zijun Wu , Grzegorz Kondrak , Muhammad Abdul-Mageed , Laks V. S. Lakshmanan

Multi-View Oriented GPLVM: Expressiveness and Efficiency

The multi-view Gaussian process latent variable model (MV-GPLVM) aims to learn a unified representation from multi-view data but is hindered by challenges such as limited kernel expressiveness and low computational efficiency. To overcome…

Machine Learning · Statistics 2025-12-16 Zi Yang , Ying Li , Zhidi Lin , Michael Minyi Zhang , Pablo M. Olmos

Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models

Vision-Language Models (VLMs) learn joint representations by mapping images and text into a shared latent space. However, recent research highlights that deterministic embeddings from standard VLMs often struggle to capture the…

Computer Vision and Pattern Recognition · Computer Science 2025-07-08 Aishwarya Venkataramanan , Paul Bodesheim , Joachim Denzler

Geometric Multimodal Contrastive Representation Learning

Learning representations of multimodal data that are both informative and robust to missing modalities at test time remains a challenging problem due to the inherent heterogeneity of data obtained from different channels. To address it, we…

Machine Learning · Computer Science 2022-11-21 Petra Poklukar , Miguel Vasco , Hang Yin , Francisco S. Melo , Ana Paiva , Danica Kragic

Preventing Model Collapse in Gaussian Process Latent Variable Models

Gaussian process latent variable models (GPLVMs) are a versatile family of unsupervised learning models commonly used for dimensionality reduction. However, common challenges in modeling data with GPLVMs include inadequate kernel…

Machine Learning · Statistics 2024-06-19 Ying Li , Zhidi Lin , Feng Yin , Michael Minyi Zhang

Generalised Gaussian Process Latent Variable Models (GPLVM) with Stochastic Variational Inference

Gaussian process latent variable models (GPLVM) are a flexible and non-linear approach to dimensionality reduction, extending classical Gaussian processes to an unsupervised learning context. The Bayesian incarnation of the GPLVM Titsias…

Machine Learning · Computer Science 2022-10-31 Vidhi Lalchand , Aditya Ravuri , Neil D. Lawrence

Multi-modal Gaussian Process Variational Autoencoders for Neural and Behavioral Data

Characterizing the relationship between neural population activity and behavioral data is a central goal of neuroscience. While latent variable models (LVMs) are successful in describing high-dimensional time-series data, they are typically…

Machine Learning · Computer Science 2026-02-12 Rabia Gondur , Usama Bin Sikandar , Evan Schaffer , Mikio Christian Aoi , Stephen L Keeley

Classifier-guided Gradient Modulation for Enhanced Multimodal Learning

Multimodal learning has developed very fast in recent years. However, during the multimodal training process, the model tends to rely on only one modality based on which it could learn faster, thus leading to inadequate use of other…

Machine Learning · Computer Science 2024-11-05 Zirun Guo , Tao Jin , Jingyuan Chen , Zhou Zhao

Invariant Gaussian Process Latent Variable Models and Application in Causal Discovery

In nonlinear latent variable models or dynamic models, if we consider the latent variables as confounders (common causes), the noise dependencies imply further relations between the observed variables. Such models are then closely related…

Machine Learning · Computer Science 2012-03-19 Kun Zhang , Bernhard Schoelkopf , Dominik Janzing

Balanced Multimodal Learning via On-the-fly Gradient Modulation

Multimodal learning helps to comprehensively understand the world, by integrating different senses. Accordingly, multiple input modalities are expected to boost model performance, but we actually find that they are not fully exploited even…

Computer Vision and Pattern Recognition · Computer Science 2022-03-30 Xiaokang Peng , Yake Wei , Andong Deng , Dong Wang , Di Hu

Modulating Scalable Gaussian Processes for Expressive Statistical Learning

For a learning task, Gaussian process (GP) is interested in learning the statistical relationship between inputs and outputs, since it offers not only the prediction mean but also the associated variability. The vanilla GP however struggles…

Machine Learning · Statistics 2020-09-01 Haitao Liu , Yew-Soon Ong , Xiaomo Jiang , Xiaofang Wang

Gaussian Mixture Modeling with Gaussian Process Latent Variable Models

Density modeling is notoriously difficult for high dimensional data. One approach to the problem is to search for a lower dimensional manifold which captures the main characteristics of the data. Recently, the Gaussian Process Latent…

Machine Learning · Statistics 2010-07-14 Hannes Nickisch , Carl Edward Rasmussen

Learning Factorized Multimodal Representations

Learning multimodal representations is a fundamentally complex research problem due to the presence of multiple heterogeneous sources of information. Although the presence of multiple modalities provides additional valuable information,…

Machine Learning · Computer Science 2019-05-15 Yao-Hung Hubert Tsai , Paul Pu Liang , Amir Zadeh , Louis-Philippe Morency , Ruslan Salakhutdinov

Quantifying Multimodal Imbalance: A GMM-Guided Adaptive Loss for Audio-Visual Learning

Multimodal learning integrates diverse modalities but suffers from modality imbalance, where dominant modalities suppress weaker ones due to inconsistent convergence rates. Existing methods predominantly rely on static modulation or…

Machine Learning · Computer Science 2026-02-11 Zhaocheng Liu , Zhiwen Yu , Xiaoqing Liu

Gramian Multimodal Representation Learning and Alignment

Human perception integrates multiple modalities, such as vision, hearing, and language, into a unified understanding of the surrounding reality. While recent multimodal models have achieved significant progress by aligning pairs of…

Computer Vision and Pattern Recognition · Computer Science 2025-02-13 Giordano Cicchetti , Eleonora Grassucci , Luigi Sigillo , Danilo Comminiello

Quantifying Multimodal Capabilities: Formal Generalization Guarantees in Pairwise Metric Learning

Multimodal learning leverages the integration of diverse data modalities to enhance performance in complex tasks. Yet, it frequently encounters incomplete or redundant modality data in real-world scenarios. This paper presents a…

Machine Learning · Computer Science 2026-05-05 Richeng Zhou , Xuelin Zhang , Liyuan Liu

Learning Gaussian Graphical Models by symmetric parallel regression technique

In this contribution we deal with the problem of learning an undirected graph which encodes the conditional dependence relationship between variables of a complex system, given a set of observations of this system. This is a very central…

Methodology · Statistics 2019-07-26 Daniela De Canditiis , Armando Guardasole

Enhancing multimodal cooperation via sample-level modality valuation

One primary topic of multimodal learning is to jointly incorporate heterogeneous information from different modalities. However most models often suffer from unsatisfactory multimodal cooperation which cannot jointly utilize all modalities…

Computer Vision and Pattern Recognition · Computer Science 2024-06-17 Yake Wei , Ruoxuan Feng , Zihe Wang , Di Hu