Related papers: Multimodal Generative Models for Scalable Weakly-S…

Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models

Learning generative models that span multiple data modalities, such as vision and language, is often motivated by the desire to learn more useful, generalisable representations that faithfully capture common underlying factors between the…

Machine Learning · Statistics 2019-11-11 Yuge Shi , N. Siddharth , Brooks Paige , Philip H. S. Torr

Joint Multimodal Learning with Deep Generative Models

We investigate deep generative models that can exchange multiple modalities bi-directionally, e.g., generating images from corresponding texts and vice versa. Recently, some studies handle multiple modalities on deep generative models, such…

Machine Learning · Statistics 2016-11-08 Masahiro Suzuki , Kotaro Nakayama , Yutaka Matsuo

Generalizing Multimodal Variational Methods to Sets

Making sense of multiple modalities can yield a more comprehensive description of real-world phenomena. However, learning the co-representation of diverse modalities is still a long-standing endeavor in emerging machine learning…

Artificial Intelligence · Computer Science 2022-12-21 Jinzhao Zhou , Yiqun Duan , Zhihong Chen , Yu-Cheng Chang , Chin-Teng Lin

MHVAE: a Human-Inspired Deep Hierarchical Generative Model for Multimodal Representation Learning

Humans are able to create rich representations of their external reality. Their internal representations allow for cross-modality inference, where available perceptions can induce the perceptual experience of missing input modalities. In…

Machine Learning · Computer Science 2020-06-05 Miguel Vasco , Francisco S. Melo , Ana Paiva

Multimodal Generative Models for Compositional Representation Learning

As deep neural networks become more adept at traditional tasks, many of the most exciting new challenges concern multimodality---observations that combine diverse types, such as image and text. In this paper, we introduce a family of…

Machine Learning · Computer Science 2019-12-12 Mike Wu , Noah Goodman

Multi-Modal Anomaly Detection for Unstructured and Uncertain Environments

To achieve high-levels of autonomy, modern robots require the ability to detect and recover from anomalies and failures with minimal human supervision. Multi-modal sensor signals could provide more information for such anomaly detection…

Robotics · Computer Science 2020-12-17 Tianchen Ji , Sri Theja Vuppala , Girish Chowdhary , Katherine Driggs-Campbell

Improving Bi-directional Generation between Different Modalities with Variational Autoencoders

We investigate deep generative models that can exchange multiple modalities bi-directionally, e.g., generating images from corresponding texts and vice versa. A major approach to achieve this objective is to train a model that integrates…

Machine Learning · Statistics 2018-01-29 Masahiro Suzuki , Kotaro Nakayama , Yutaka Matsuo

On the Limitations of Multimodal VAEs

Multimodal variational autoencoders (VAEs) have shown promise as efficient generative models for weakly-supervised data. Yet, despite their advantage of weak supervision, they exhibit a gap in generative quality compared to unimodal VAEs,…

Machine Learning · Computer Science 2022-04-08 Imant Daunhawer , Thomas M. Sutter , Kieran Chin-Cheong , Emanuele Palumbo , Julia E. Vogt

Analyzing Multimodal Integration in the Variational Autoencoder from an Information-Theoretic Perspective

Human perception is inherently multimodal. We integrate, for instance, visual, proprioceptive and tactile information into one experience. Hence, multimodal learning is of importance for building robotic systems that aim at robustly…

Machine Learning · Computer Science 2024-11-04 Carlotta Langer , Yasmin Kim Georgie , Ilja Porohovoj , Verena Vanessa Hafner , Nihat Ay

Multimodal Variational Autoencoders for Semi-Supervised Learning: In Defense of Product-of-Experts

Multimodal generative models should be able to learn a meaningful latent representation that enables a coherent joint generation of all modalities (e.g., images and text). Many applications also require the ability to accurately sample…

Machine Learning · Computer Science 2021-08-02 Svetlana Kutuzova , Oswin Krause , Douglas McCloskey , Mads Nielsen , Christian Igel

Multimodal ELBO with Diffusion Decoders

Multimodal variational autoencoders have demonstrated their ability to learn the relationships between different modalities by mapping them into a latent representation. Their design and capacity to perform any-to-any conditional and…

Machine Learning · Computer Science 2025-02-04 Daniel Wesego , Pedram Rooshenas

Unbiased Learning of Deep Generative Models with Structured Discrete Representations

By composing graphical models with deep learning architectures, we learn generative models with the strengths of both frameworks. The structured variational autoencoder (SVAE) inherits structure and interpretability from graphical models,…

Machine Learning · Computer Science 2023-11-15 Harry Bendekgey , Gabriel Hope , Erik B. Sudderth

Learning Latent Subspaces in Variational Autoencoders

Variational autoencoders (VAEs) are widely used deep generative models capable of learning unsupervised latent representations of data. Such representations are often difficult to interpret or control. We consider the problem of…

Machine Learning · Computer Science 2018-12-18 Jack Klys , Jake Snell , Richard Zemel

Enhancing Unimodal Latent Representations in Multimodal VAEs through Iterative Amortized Inference

Multimodal variational autoencoders (VAEs) aim to capture shared latent representations by integrating information from different data modalities. A significant challenge is accurately inferring representations from any subset of modalities…

Machine Learning · Computer Science 2024-10-16 Yuta Oshima , Masahiro Suzuki , Yutaka Matsuo

Disentangling by Partitioning: A Representation Learning Framework for Multimodal Sensory Data

Multimodal sensory data resembles the form of information perceived by humans for learning, and are easy to obtain in large quantities. Compared to unimodal data, synchronization of concepts between modalities in such data provides…

Machine Learning · Statistics 2018-05-30 Wei-Ning Hsu , James Glass

Multimodal Variational Autoencoders have emerged as a popular tool to extract effective representations from rich multimodal data. However, such models rely on fusion strategies in latent space that destroy the joint statistical structure…

Machine Learning · Computer Science 2026-03-03 Federico Caretti , Guido Sanguinetti

A survey of multimodal deep generative models

Multimodal learning is a framework for building models that make predictions based on different types of modalities. Important challenges in multimodal learning are the inference of shared representations from arbitrary modalities and…

Machine Learning · Computer Science 2022-07-06 Masahiro Suzuki , Yutaka Matsuo

Generalized Zero-Shot Learning using Multimodal Variational Auto-Encoder with Semantic Concepts

With the ever-increasing amount of data, the central challenge in multimodal learning involves limitations of labelled samples. For the task of classification, techniques such as meta-learning, zero-shot learning, and few-shot learning…

Computer Vision and Pattern Recognition · Computer Science 2021-06-29 Nihar Bendre , Kevin Desai , Peyman Najafirad

Benchmarking Multimodal Variational Autoencoders: CdSprites+ Dataset and Toolkit

Multimodal Variational Autoencoders (VAEs) have been the subject of intense research in the past years as they can integrate multiple modalities into a joint representation and can thus serve as a promising tool for both data classification…

Machine Learning · Computer Science 2024-09-18 Gabriela Sejnova , Michal Vavrecka , Karla Stepanova , Tadahiro Taniguchi

Learning Multimodal VAEs through Mutual Supervision

Multimodal VAEs seek to model the joint distribution over heterogeneous data (e.g.\ vision, language), whilst also capturing a shared representation across such modalities. Prior work has typically combined information from the modalities…

Machine Learning · Computer Science 2022-12-19 Tom Joy , Yuge Shi , Philip H. S. Torr , Tom Rainforth , Sebastian M. Schmon , N. Siddharth