Related papers: Multimodal Generative Models for Compositional Rep…

A survey of multimodal deep generative models

Multimodal learning is a framework for building models that make predictions based on different types of modalities. Important challenges in multimodal learning are the inference of shared representations from arbitrary modalities and…

Machine Learning · Computer Science 2022-07-06 Masahiro Suzuki , Yutaka Matsuo

Joint Multimodal Learning with Deep Generative Models

We investigate deep generative models that can exchange multiple modalities bi-directionally, e.g., generating images from corresponding texts and vice versa. Recently, some studies handle multiple modalities on deep generative models, such…

Machine Learning · Statistics 2016-11-08 Masahiro Suzuki , Kotaro Nakayama , Yutaka Matsuo

Multimodal Generative Models for Scalable Weakly-Supervised Learning

Multiple modalities often co-occur when describing natural phenomena. Learning a joint representation of these modalities should yield deeper and more useful representations. Previous generative approaches to multi-modal input either do not…

Machine Learning · Computer Science 2018-11-13 Mike Wu , Noah Goodman

MHVAE: a Human-Inspired Deep Hierarchical Generative Model for Multimodal Representation Learning

Humans are able to create rich representations of their external reality. Their internal representations allow for cross-modality inference, where available perceptions can induce the perceptual experience of missing input modalities. In…

Machine Learning · Computer Science 2020-06-05 Miguel Vasco , Francisco S. Melo , Ana Paiva

Audio-to-Image Cross-Modal Generation

Cross-modal representation learning allows to integrate information from different modalities into one representation. At the same time, research on generative models tends to focus on the visual domain with less emphasis on other domains,…

Multimedia · Computer Science 2022-08-16 Maciej Żelaszczyk , Jacek Mańdziuk

Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models

Learning generative models that span multiple data modalities, such as vision and language, is often motivated by the desire to learn more useful, generalisable representations that faithfully capture common underlying factors between the…

Machine Learning · Statistics 2019-11-11 Yuge Shi , N. Siddharth , Brooks Paige , Philip H. S. Torr

On the Limitations of Multimodal VAEs

Multimodal variational autoencoders (VAEs) have shown promise as efficient generative models for weakly-supervised data. Yet, despite their advantage of weak supervision, they exhibit a gap in generative quality compared to unimodal VAEs,…

Machine Learning · Computer Science 2022-04-08 Imant Daunhawer , Thomas M. Sutter , Kieran Chin-Cheong , Emanuele Palumbo , Julia E. Vogt

Discriminative Multimodal Learning via Conditional Priors in Generative Models

Deep generative models with latent variables have been used lately to learn joint representations and generative processes from multi-modal data. These two learning mechanisms can, however, conflict with each other and representations can…

Machine Learning · Computer Science 2023-01-24 Rogelio A. Mancisidor , Michael Kampffmeyer , Kjersti Aas , Robert Jenssen

Private-Shared Disentangled Multimodal VAE for Learning of Hybrid Latent Representations

Multi-modal generative models represent an important family of deep models, whose goal is to facilitate representation learning on data with multiple views or modalities. However, current deep multi-modal models focus on the inference of…

Computer Vision and Pattern Recognition · Computer Science 2020-12-25 Mihee Lee , Vladimir Pavlovic

Multimodal ELBO with Diffusion Decoders

Multimodal variational autoencoders have demonstrated their ability to learn the relationships between different modalities by mapping them into a latent representation. Their design and capacity to perform any-to-any conditional and…

Machine Learning · Computer Science 2025-02-04 Daniel Wesego , Pedram Rooshenas

Improving Bi-directional Generation between Different Modalities with Variational Autoencoders

We investigate deep generative models that can exchange multiple modalities bi-directionally, e.g., generating images from corresponding texts and vice versa. A major approach to achieve this objective is to train a model that integrates…

Machine Learning · Statistics 2018-01-29 Masahiro Suzuki , Kotaro Nakayama , Yutaka Matsuo

Learning multi-modal generative models with permutation-invariant encoders and tighter variational objectives

Devising deep latent variable models for multi-modal data has been a long-standing theme in machine learning research. Multi-modal Variational Autoencoders (VAEs) have been a popular generative model class that learns latent representations…

Machine Learning · Statistics 2024-09-25 Marcel Hirt , Domenico Campolo , Victoria Leong , Juan-Pablo Ortega

Relating by Contrasting: A Data-efficient Framework for Multimodal Generative Models

Multimodal learning for generative models often refers to the learning of abstract concepts from the commonality of information in multiple modalities, such as vision and language. While it has proven effective for learning generalisable…

Machine Learning · Computer Science 2021-04-22 Yuge Shi , Brooks Paige , Philip H. S. Torr , N. Siddharth

Multimodal Variational Autoencoders have emerged as a popular tool to extract effective representations from rich multimodal data. However, such models rely on fusion strategies in latent space that destroy the joint statistical structure…

Machine Learning · Computer Science 2026-03-03 Federico Caretti , Guido Sanguinetti

Unity by Diversity: Improved Representation Learning in Multimodal VAEs

Variational Autoencoders for multimodal data hold promise for many tasks in data analysis, such as representation learning, conditional generation, and imputation. Current architectures either share the encoder output, decoder input, or…

Machine Learning · Computer Science 2025-01-08 Thomas M. Sutter , Yang Meng , Andrea Agostini , Daphné Chopard , Norbert Fortin , Julia E. Vogt , Babak Shahbaba , Stephan Mandt

Multimodal Variational Autoencoders for Semi-Supervised Learning: In Defense of Product-of-Experts

Multimodal generative models should be able to learn a meaningful latent representation that enables a coherent joint generation of all modalities (e.g., images and text). Many applications also require the ability to accurately sample…

Machine Learning · Computer Science 2021-08-02 Svetlana Kutuzova , Oswin Krause , Douglas McCloskey , Mads Nielsen , Christian Igel

Generative Adversarial Networks for Multimodal Representation Learning in Video Hyperlinking

Continuous multimodal representations suitable for multimodal information retrieval are usually obtained with methods that heavily rely on multimodal autoencoders. In video hyperlinking, a task that aims at retrieving video segments, the…

Multimedia · Computer Science 2017-05-16 Vedran Vukotic , Christian Raymond , Guillaume Gravier

Unbiased Learning of Deep Generative Models with Structured Discrete Representations

By composing graphical models with deep learning architectures, we learn generative models with the strengths of both frameworks. The structured variational autoencoder (SVAE) inherits structure and interpretability from graphical models,…

Machine Learning · Computer Science 2023-11-15 Harry Bendekgey , Gabriel Hope , Erik B. Sudderth

Generalizing Multimodal Variational Methods to Sets

Making sense of multiple modalities can yield a more comprehensive description of real-world phenomena. However, learning the co-representation of diverse modalities is still a long-standing endeavor in emerging machine learning…

Artificial Intelligence · Computer Science 2022-12-21 Jinzhao Zhou , Yiqun Duan , Zhihong Chen , Yu-Cheng Chang , Chin-Teng Lin

Learning Factorized Multimodal Representations

Learning multimodal representations is a fundamentally complex research problem due to the presence of multiple heterogeneous sources of information. Although the presence of multiple modalities provides additional valuable information,…

Machine Learning · Computer Science 2019-05-15 Yao-Hung Hubert Tsai , Paul Pu Liang , Amir Zadeh , Louis-Philippe Morency , Ruslan Salakhutdinov