Related papers: Multi-modal Latent Diffusion

A survey of multimodal deep generative models

Multimodal learning is a framework for building models that make predictions based on different types of modalities. Important challenges in multimodal learning are the inference of shared representations from arbitrary modalities and…

Machine Learning · Computer Science 2022-07-06 Masahiro Suzuki , Yutaka Matsuo

Latent Diffusion for Language Generation

Diffusion models have achieved great success in modeling continuous data modalities such as images, audio, and video, but have seen limited use in discrete domains such as language. Recent attempts to adapt diffusion to language have…

Computation and Language · Computer Science 2023-11-08 Justin Lovelace , Varsha Kishore , Chao Wan , Eliot Shekhtman , Kilian Q. Weinberger

Unity by Diversity: Improved Representation Learning in Multimodal VAEs

Variational Autoencoders for multimodal data hold promise for many tasks in data analysis, such as representation learning, conditional generation, and imputation. Current architectures either share the encoder output, decoder input, or…

Machine Learning · Computer Science 2025-01-08 Thomas M. Sutter , Yang Meng , Andrea Agostini , Daphné Chopard , Norbert Fortin , Julia E. Vogt , Babak Shahbaba , Stephan Mandt

Conditional Generative Modeling via Learning the Latent Space

Although deep learning has achieved appealing results on several machine learning tasks, most of the models are deterministic at inference, limiting their application to single-modal settings. We propose a novel general-purpose framework…

Machine Learning · Computer Science 2020-10-12 Sameera Ramasinghe , Kanchana Ranasinghe , Salman Khan , Nick Barnes , Stephen Gould

Multimodal ELBO with Diffusion Decoders

Multimodal variational autoencoders have demonstrated their ability to learn the relationships between different modalities by mapping them into a latent representation. Their design and capacity to perform any-to-any conditional and…

Machine Learning · Computer Science 2025-02-04 Daniel Wesego , Pedram Rooshenas

Bridging the inference gap in Mutimodal Variational Autoencoders

From medical diagnosis to autonomous vehicles, critical applications rely on the integration of multiple heterogeneous data modalities. Multimodal Variational Autoencoders offer versatile and scalable methods for generating unobserved…

Machine Learning · Computer Science 2025-02-07 Agathe Senellart , Stéphanie Allassonnière

Learning Sequential Latent Variable Models from Multimodal Time Series Data

Sequential modelling of high-dimensional data is an important problem that appears in many domains including model-based reinforcement learning and dynamics identification for control. Latent variable models applied to sequential data…

Machine Learning · Computer Science 2023-01-23 Oliver Limoyo , Trevor Ablett , Jonathan Kelly

Generative Latent Diffusion for Efficient Spatiotemporal Data Reduction

Generative models have demonstrated strong performance in conditional settings and can be viewed as a form of data compression, where the condition serves as a compact representation. However, their limited controllability and…

Machine Learning · Computer Science 2025-07-04 Xiao Li , Liangji Zhu , Anand Rangarajan , Sanjay Ranka

Ladder Variational Autoencoders

Variational Autoencoders are powerful models for unsupervised learning. However deep models with several layers of dependent stochastic variables are difficult to train which limits the improvements obtained using these highly expressive…

Machine Learning · Statistics 2016-05-30 Casper Kaae Sønderby , Tapani Raiko , Lars Maaløe , Søren Kaae Sønderby , Ole Winther

Diffusion Models For Multi-Modal Generative Modeling

Diffusion-based generative modeling has been achieving state-of-the-art results on various generation tasks. Most diffusion models, however, are limited to a single-generation modeling. Can we generalize diffusion models with the ability of…

Computer Vision and Pattern Recognition · Computer Science 2024-09-26 Changyou Chen , Han Ding , Bunyamin Sisman , Yi Xu , Ouye Xie , Benjamin Z. Yao , Son Dinh Tran , Belinda Zeng

Diffuse Everything: Multimodal Diffusion Models on Arbitrary State Spaces

Diffusion models have demonstrated remarkable performance in generating unimodal data across various tasks, including image, video, and text generation. On the contrary, the joint generation of multimodal data through diffusion models is…

Machine Learning · Computer Science 2025-06-16 Kevin Rojas , Yuchen Zhu , Sichen Zhu , Felix X. -F. Ye , Molei Tao

Increasing the Generalisation Capacity of Conditional VAEs

We address the problem of one-to-many mappings in supervised learning, where a single instance has many different solutions of possibly equal cost. The framework of conditional variational autoencoders describes a class of methods to tackle…

Machine Learning · Statistics 2019-09-11 Alexej Klushyn , Nutan Chen , Botond Cseke , Justin Bayer , Patrick van der Smagt

Factorized Video Autoencoders for Efficient Generative Modelling

Latent variable generative models have emerged as powerful tools for generative tasks including image and video synthesis. These models are enabled by pretrained autoencoders that map high resolution data into a compressed lower dimensional…

Computer Vision and Pattern Recognition · Computer Science 2025-06-13 Mohammed Suhail , Carlos Esteves , Leonid Sigal , Ameesh Makadia

Learning a Multi-Modal Policy via Imitating Demonstrations with Mixed Behaviors

We propose a novel approach to train a multi-modal policy from mixed demonstrations without their behavior labels. We develop a method to discover the latent factors of variation in the demonstrations. Specifically, our method is based on…

Machine Learning · Computer Science 2019-03-26 Fang-I Hsiao , Jui-Hsuan Kuo , Min Sun

The Learnability Gap in Medical Latent Diffusion

Generative data augmentation with latent diffusion models is a promising strategy for addressing class imbalance in medical imaging, yet current approaches focus on perceptual fidelity and domain-specific autoencoder fine-tuning while…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Mischa Dombrowski , Felix Nützel , Bernhard Kainz

Collaborative Diffusion for Multi-Modal Face Generation and Editing

Diffusion models arise as a powerful generative tool recently. Despite the great progress, existing diffusion models mainly focus on uni-modal control, i.e., the diffusion process is driven by only one modality of condition. To further…

Computer Vision and Pattern Recognition · Computer Science 2023-04-21 Ziqi Huang , Kelvin C. K. Chan , Yuming Jiang , Ziwei Liu

Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders

While recent neural encoder-decoder models have shown great promise in modeling open-domain conversations, they often generate dull and generic responses. Unlike past work that has focused on diversifying the output of the decoder at…

Computation and Language · Computer Science 2017-10-24 Tiancheng Zhao , Ran Zhao , Maxine Eskenazi

Variational methods for Conditional Multimodal Deep Learning

In this paper, we address the problem of conditional modality learning, whereby one is interested in generating one modality given the other. While it is straightforward to learn a joint distribution over multiple modalities using a deep…

Computer Vision and Pattern Recognition · Computer Science 2016-08-29 Gaurav Pandey , Ambedkar Dukkipati

DiffSDA: Unsupervised Diffusion Sequential Disentanglement Across Modalities

Unsupervised representation learning, particularly sequential disentanglement, aims to separate static and dynamic factors of variation in data without relying on labels. This remains a challenging problem, as existing approaches based on…

Machine Learning · Computer Science 2025-10-08 Hedi Zisling , Ilan Naiman , Nimrod Berman , Supasorn Suwajanakorn , Omri Azencot

On Designing Diffusion Autoencoders for Efficient Generation and Representation Learning

Diffusion autoencoders (DAs) are variants of diffusion generative models that use an input-dependent latent variable to capture representations alongside the diffusion process. These representations, to varying extents, can be used for…

Machine Learning · Computer Science 2025-06-03 Magdalena Proszewska , Nikolay Malkin , N. Siddharth